Closed Ner0oooooo closed 3 years ago
Hello @Ner0oooooo ,
Thanks for your interest! Please ensure that --checkpoint
points to a valid checkpoint file.
The command shown in train_simmc_model.sh
is only an illustration and the actual paths might have be adjusted. Feel free to reopen this issue if the problem persists.
When I run train_simmc_model.sh and I found some issues. After finish training and begin to evaluate, then some error happen. Traceback (most recent call last): File "eval_simmc_agent.py", line 199, in
main(args)
File "eval_simmc_agent.py", line 24, in main
checkpoint = torch.load(args["checkpoint"], map_location=torch.device("cpu"))
File "/home/chenrj/anaconda3/envs/py3.8/lib/python3.8/site-packages/torch/serialization.py", line 584, in load
with _open_file_like(f, 'rb') as opened_file:
File "/home/chenrj/anaconda3/envs/py3.8/lib/python3.8/site-packages/torch/serialization.py", line 234, in _open_file_like
return _open_file(name_or_buffer, mode)
File "/home/chenrj/anaconda3/envs/py3.8/lib/python3.8/site-packages/torch/serialization.py", line 215, in init
super(_open_file, self).init(open(name, mode))
FileNotFoundError: [Errno 2] No such file or directory: '/hae/epoch_20.tar'
I found that the result is saved in mm_action_prediction/checkpoints ,and the last file is epoch_100.tar. And the file scripts/train_simmc_model.sh seemed to ignore the definition of the variable CHECKPOINT_ROOT so lead to the error
Evaluate a trained model checkpoint.
CHECKPOINT_PATH="${CHECKPOINT_ROOT}/hae/epoch_20.tar" python -u eval_simmc_agent.py \ --eval_data_path=${DEVTEST_JSON_FILE/.json/_mm_inputs.npy} \ --checkpoint="$CHECKPOINT_PATH" --gpu_id=${GPU_ID} --batch_size=50 \ --domain="$DOMAIN"
Should I change the path to 'checkpoints/epoch_100.tar' or to 'checkpoints/epoch_20.tar'?