When I run sh ./scripts/tune_script/graphgpt_stage2.sh, I encounter an error, and the error message is as follows:
raise ValueError("Can't find a valid checkpoint at {resume_from_checkpoint}")
ValueError: Can't find a valid checkpoint at /data1/path/checkpoints/stage_2/checkpoint-50000
I have checked the contents of /data1/path/checkpoints/stage_2/checkpoint-50000 and listed the following files:
When I run
sh ./scripts/tune_script/graphgpt_stage2.sh
, I encounter an error, and the error message is as follows:raise ValueError("Can't find a valid checkpoint at {resume_from_checkpoint}") ValueError: Can't find a valid checkpoint at /data1/path/checkpoints/stage_2/checkpoint-50000
I have checked the contents of
/data1/path/checkpoints/stage_2/checkpoint-50000
and listed the following files:config.json pytorch_model-00001-of-00003.bin rng_state_1.pth generation_config.json pytorch_model-00002-of-00003.bin
I would like to ask if anyone has encountered a similar issue where the checkpoint files exist, but the script reports that it cannot find them.