Closed ghost closed 2 years ago
I use the sh and config ./tools/slurm_train.sh a100 vild configs/lvis/detpro_ens_20e.py workdirs/vild_ens_20e_fg_bg_5_10_end --cfg-options model.roi_head.load_feature=True to reproduce vild with 8 32g a100 batchsize 24(83) and in the issue, you claim only 0.75s per iter. but for me, it is 6s。20epoch cost about 30 days.
I guess you have got some path wrong. Can you check whether the lvis_clip_image_embedding.zip has been successfully loaded.
I put lvis_clip_image_embedding.zip under ./data and unzip lvis_clip_image_embedding.zip to /data/lvis_clip_image_embedding. like home/detpro/data/lvis_clip_image_embedding/train2017/000000000030.pth
@dyabel
I put lvis_clip_image_embedding.zip under ./data and unzip lvis_clip_image_embedding.zip to /data/lvis_clip_image_embedding. like home/detpro/data/lvis_clip_image_embedding/train2017/000000000030.pth
@dyabel
Then there should be no problem, the only reason I can think of for costing such long time is that the pre-extracted embeddings are not loaded correctly, then the code will choose to do the clip forwarding process online.
I use the sh and config ./tools/slurm_train.sh a100 vild configs/lvis/detpro_ens_20e.py workdirs/vild_ens_20e_fg_bg_5_10_end --cfg-options model.roi_head.load_feature=True to reproduce vild with 8 32g a100 batchsize 24(83)
and in the issue, you claim only 0.75s per iter. but for me, it is 6s。20epoch cost about 30 days.