Traceback (most recent call last):
File "train.py", line 869, in <module>
main(args)
File "train.py", line 836, in main
result = val(model, epoch, val_dataloader, args)
File "/home/verigle/miniconda3/envs/DDCap/lib/python3.8/site-packages/torch/autograd/grad_mode.py", line 27, in decorate_context
return func(*args, **kwargs)
File "train.py", line 665, in val
part_result = json.load(open(f"{args.out_dir}/.cache/{args.tag}/tmp-results-{i}.json"))
FileNotFoundError: [Errno 2] No such file or directory: './OUTPUT/checkpoints/caption_diff_vitb16/.cache/caption_diff_vitb16/tmp-results-0.json'
ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 0 (pid: 2856941) of binary: /home/verigle/miniconda3/envs/DDCap/bin/python
os.makedirs(f'.cache/{args.tag}', exist_ok=True)
json.dump(result_all, open(f".cache/{args.tag}/tmp-results-{dist.get_rank()}.json", "w"))
torch.distributed.barrier()
if dist.get_rank() == 0:
result_all = []
ra_id = []
for i in range(dist.get_world_size()):
part_result = json.load(open(f"{args.out_dir}/.cache/{args.tag}/tmp-results-{i}.json"))
for ep in part_result:
if ep['image_id'] not in ra_id:
ra_id.append(ep['image_id'])
result_all.append(ep)
然而保存的文件位置 没有把 args.out_dir 前缀路径拼接上,导致保存文件路径与实际文件路径不一致