ManifoldRG / MultiNet

MIT License
10 stars 1 forks source link

Infra scoping for fine-tuning #70

Open pranavguru opened 1 month ago

pranavguru commented 1 month ago

Look into:

devjwsong commented 1 month ago

Similar mechanism as https://github.com/ManifoldRG/MultiNet/issues/71#issuecomment-2266368447 But only inference cost is published, so I included a multiplication factor (e.g. 4) to the inference time.

Here is the calculation algorithm one training epoch per dataset:

  1. Number of iterations: (number of train samples) / (batch size)
  2. Seconds to run one iteration: (training multiplication factor) (second per iteration with Octo-base) (relative size compare to Octo-base) / (relative performance of the GPU compared RTX 4090)
    • second per iteration with Octo-base: 1/13
    • relative size compare to Octo-base: (profiling model size) / (Octo-base size)
    • relative performance of the GPU compared RTX 4090: (number of GPUs) * (performance of the GPU) / (performance of RTX 4090)
  3. Cost of training one epoch for one dataset: (Number of iterations) (Seconds to run one iteration) (Cost per second of the cloud instance)

Then we calculate the inference cost with the validation set for one epoch with the same algorithm.

Finally, we can get the total fine-tuning cost of one dataset: (Number of epochs) * (training cost per epoch + inference cost per epoch)

Here is the codes: https://colab.research.google.com/drive/14j5DBPpgk9-Z-h8kQ6lMiiIyilxs1zko?usp=sharing

pranavguru commented 1 month ago

Is a multiplication factor of 4 to the inference time a common rule of thumb when estimating fine-tuning time (secs per iter)?