buxiangzhiren / DDCap

MIT License
83 stars 11 forks source link

tmp-results-{}.json 保存问题 写入文件与保存文件位置不一致 #10

Open verigle opened 1 year ago

verigle commented 1 year ago
Traceback (most recent call last):
  File "train.py", line 869, in <module>
    main(args)
  File "train.py", line 836, in main
    result = val(model, epoch, val_dataloader, args)
  File "/home/verigle/miniconda3/envs/DDCap/lib/python3.8/site-packages/torch/autograd/grad_mode.py", line 27, in decorate_context
    return func(*args, **kwargs)
  File "train.py", line 665, in val
    part_result = json.load(open(f"{args.out_dir}/.cache/{args.tag}/tmp-results-{i}.json"))
FileNotFoundError: [Errno 2] No such file or directory: './OUTPUT/checkpoints/caption_diff_vitb16/.cache/caption_diff_vitb16/tmp-results-0.json'
ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 0 (pid: 2856941) of binary: /home/verigle/miniconda3/envs/DDCap/bin/python

然而保存的文件位置 没有把 args.out_dir 前缀路径拼接上,导致保存文件路径与实际文件路径不一致

  os.makedirs(f'.cache/{args.tag}', exist_ok=True)
    json.dump(result_all, open(f".cache/{args.tag}/tmp-results-{dist.get_rank()}.json", "w"))
    torch.distributed.barrier()
    if dist.get_rank() == 0:
        result_all = []
        ra_id = []
        for i in range(dist.get_world_size()):
            part_result = json.load(open(f"{args.out_dir}/.cache/{args.tag}/tmp-results-{i}.json"))
            for ep in part_result:
                if ep['image_id'] not in ra_id:
                    ra_id.append(ep['image_id'])
                    result_all.append(ep)
buxiangzhiren commented 1 year ago

好的,十分抱歉,谢谢你的提醒啦

buxiangzhiren commented 1 year ago

已经修改了