train time for vild - Githubissues

dyabel / detpro

Apache License 2.0

171 stars 26 forks source link

train time for vild #19

Closed ghost closed 2 years ago

ghost commented 2 years ago

I use the sh and config ./tools/slurm_train.sh a100 vild configs/lvis/detpro_ens_20e.py workdirs/vild_ens_20e_fg_bg_5_10_end --cfg-options model.roi_head.load_feature=True to reproduce vild with 8 32g a100 batchsize 24(83)

and in the issue, you claim only 0.75s per iter. but for me, it is 6s。20epoch cost about 30 days.

dyabel commented 2 years ago

I use the sh and config ./tools/slurm_train.sh a100 vild configs/lvis/detpro_ens_20e.py workdirs/vild_ens_20e_fg_bg_5_10_end --cfg-options model.roi_head.load_feature=True to reproduce vild with 8 32g a100 batchsize 24(83) and in the issue, you claim only 0.75s per iter. but for me, it is 6s。20epoch cost about 30 days.

I guess you have got some path wrong. Can you check whether the lvis_clip_image_embedding.zip has been successfully loaded.

ghost commented 2 years ago

I put lvis_clip_image_embedding.zip under ./data and unzip lvis_clip_image_embedding.zip to /data/lvis_clip_image_embedding. like home/detpro/data/lvis_clip_image_embedding/train2017/000000000030.pth

@dyabel

dyabel commented 2 years ago

I put lvis_clip_image_embedding.zip under ./data and unzip lvis_clip_image_embedding.zip to /data/lvis_clip_image_embedding. like home/detpro/data/lvis_clip_image_embedding/train2017/000000000030.pth

@dyabel

Then there should be no problem, the only reason I can think of for costing such long time is that the pre-extracted embeddings are not loaded correctly, then the code will choose to do the clip forwarding process online.