argonne-lcf / Megatron-DeepSpeed

Ongoing research training transformer language models at scale, including: BERT & GPT-2
Other
7 stars 8 forks source link

convert MDS checkpoint to Hf Llama model #19

Closed vksastry closed 3 months ago

vksastry commented 3 months ago

I thought probably we may need this if we change the architecture in future. But if it is hurting, I can remove them.

saforem2 commented 3 months ago

🤷🏻‍♂️ leave 'em in there if they're useful