Open junsiknss opened 9 months ago
All experiments could be conducted on a single RTX 3090. I'll release some running logs recording the correct training and inference processes.
Logs could be downloaded using Google Drive. Thanks for your attention, please feel free to contact us whenever you have other questions.
Hello. I just read your paper. In the paper, it is mentioned that the extra parameters are only needed ~0.2% (0.12M) of the original model (T5 small: 60M) when inferencing, but I didn't find anything about memory usage when training MELO.
Is it possible to get a rough idea of how much GPU Memory resources are required when training MELO? Or if I'm misunderstanding the paper, please let me know.
Thanks.