🚀 Efficiently (pre)training foundation models with native PyTorch features, including FSDP for training and SDPA implementation of Flash attention v2.
For currently training a speculator using the specu-train branch, getting OOM error when trying to load a checkpoint in HuggingFace format. The model_type is "gpt_megatron". The script works fine for other Llama checkpoints with model_type "llama"
For currently training a speculator using the specu-train branch, getting OOM error when trying to load a checkpoint in HuggingFace format. The model_type is "gpt_megatron". The script works fine for other Llama checkpoints with model_type "llama"
Checkpoint folder structure
Observed Error