Computational resources for replicating pre-training

suyashkumar2409 commented 1 year ago

Hi! I am an AI researcher at Georgia tech, and we have decided to replicate your results and develop them further. We are currently estimating the feasibility of this endeavour, given our limited time and computational resources, and were wondering whether you could guide us what computation resources, time and cost would it involve to train the pre-trained model from scratch.

While the paper does mention the use of 8 V100 GPUs, the training time is not mentioned, hence we can't calculate the cost involved either.

We want to estimate whether we want to extend your work from a pretraining POV, or finetuning POV, depending upon the answer.

Walter0807 commented 1 year ago

Hi, pretraining takes about 2-3 days, and fine-tuning takes about 4hrs-1day, depending on the tasks. Hope this would help.

suyashkumar2409 commented 1 year ago

Thank you for responding!

In the docs I saw that there is a main model and a lite model, and that the results between the main model and the lite model weren't significantly different. how much training does the lite model take?

Walter0807 commented 1 year ago

The speed difference is not very large, but the lite model requires less GPU memory.

aarshp commented 1 year ago

Hi @Walter0807, does finetuning on lite model requires 8 V100 or can it be achieved with something perhaps like just one V100

Thanks

Walter0807 commented 1 year ago

No, ft does not require 8 cards; it depends on your task also.

Walter0807 / MotionBERT

Computational resources for replicating pre-training #71