Closed mshavliuk closed 1 month ago
Hello,
I appreciate your interest in our work!
Tesla V100-SXM2-32GB
or A100-40GB
; however, the GPU utilization varies depending on the batch size and some of the hyperparameters.Best Regards
Thank you for your quick reply!
either
Tesla V100-SXM2-32GB
orA100-40GB
Could you please also tell the approximate time of training, or a range? That is, if it takes 10-50h for pretraining or rather 100h+ (which would unfortunately exceed my research budget).
Which one are you interested in?
The one pretrained on P19
is the most interesting to me (before supervised fine-tuning). As for the choice of the loss function, it's not clear yet. Could you share all 3? If you wish to continue this as a private conversation, you could email me at mikhail.shavliuk@tuni.fi
In my setup with RTX 3060 12GB the pretraining run with batch size 128 took 28 minutes for 50 epochs while consuming about 6GB of VRAM and utilizing GPU at about 50%
Thank you Mikhail for the update!
Hello,
I have been working with your paper and I am impressed by the methodology and results presented. However, I noticed that the paper does not provide specific details about the computational resources required to run the model. I have a couple of questions that I hope you can help with:
Thank you in advance for your assistance and for sharing your work with the community.