Closed WeianMao closed 4 months ago
I'm on the commit:97d1abb2bca0b5daff6d434c4bb340d3bb702e86
This issue is stale because it has been open for 30 days with no activity. Remove stale label or comment or this will be closed in 7 days.
This issue was closed because it has been inactive for 7 days since being marked as stale.
Does this error solved? It seems that _load_state_dict_from_disk expects model.ckpt file, but the result of untar model.nemo generates model_weights folder.
same issue I manage to run the pretraining script by setting model.mm_cfg.llm.from_pretrained=null and it works but is he seems to pretraing the llm from scratch (?)
when I train the Neva model, I got following error
Steps/Code to reproduce bug
First, I used following script to convert the Llama hf checkpoint to Nemo checkpoint (I try Vicuna and Llama both, but I got the same error):
Then, I launch the train process (I tried 1 gpu and 8 gpu, but I got the same error):
Expected behavior
the training should start
Environment overview (please complete the following information)
I am in the main brach, I use the docker following:
Environment details
I try to compile the Nemo in the docker. however, It dose not work.
Additional context
8 H800 GPU I'm on the commit:97d1abb2bca0b5daff6d434c4bb340d3bb702e86