While merging a sharded llama2 7b tp2-pp2 checkpoint the exception AttributeError: 'TransformerLanguageModel' object has no attribute 'lm_head' is thrown here.
Traceback
Traceback (most recent call last):
File "/root/koepf/epfl-megatron/tools/checkpoint_util.py", line 152, in <module>
main()
File "/root/koepf/epfl-megatron/tools/checkpoint_util.py", line 145, in main
loader.load_checkpoint(queue, args)
File "/root/koepf/epfl-megatron/tools/checkpoint_loader_megatron.py", line 319, in load_checkpoint
_load_checkpoint(queue, args)
File "/root/koepf/epfl-megatron/tools/checkpoint_loader_megatron.py", line 221, in _load_checkpoint
queue_put("lm_head", {"lm_head": torch.cat([models[tp_rank].language_model.lm_head.data
File "/root/koepf/epfl-megatron/tools/checkpoint_loader_megatron.py", line 221, in <listcomp>
queue_put("lm_head", {"lm_head": torch.cat([models[tp_rank].language_model.lm_head.data
File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1630, in __getattr__
raise AttributeError("'{}' object has no attribute '{}'".format(
AttributeError: 'TransformerLanguageModel' object has no attribute 'lm_head'
While merging a sharded llama2 7b tp2-pp2 checkpoint the exception
AttributeError: 'TransformerLanguageModel' object has no attribute 'lm_head'
is thrown here.Traceback
Command used
python tools/checkpoint_util.py --target_tensor_parallel_size 1 --target_pipeline_parallel_size 1 --load_dir /root/koepf/megatron-data/checkpoints/llama2-7b-tp2-pp2-trained/ --save_dir /root/koepf/megatron-data/llama2-7b-out --model_type llama2 --bf16