bigscience-workshop / Megatron-DeepSpeed

Ongoing research training transformer language models at scale, including: BERT & GPT-2
Other
1.3k stars 211 forks source link

fintuning bloom 176b with bitfit #359

Closed drxmy closed 1 year ago

drxmy commented 1 year ago

I saw there is a branch with bitfit and wonder how much VRAM it will use to fintune bloom-176b. @Muennighoff

Thank you

Muennighoff commented 1 year ago

Hey! Not sure about how much VRAM it will use, but i'd guess somewhere like 10 - 40% less than the full parameter training.

drxmy commented 1 year ago

Thank you!