dyabel / detpro

Apache License 2.0
171 stars 26 forks source link

Prepare.sh structure of the zip directory #23

Closed ZhuoranYu closed 2 years ago

ZhuoranYu commented 2 years ago

Hi,

Thanks again for your work. I followed the instruction on prepare.sh and tried to reproduce vild*. However, the estimated training time is > 5 days instead of ~ 2 days as you mentioned in other posts. I suspect the issue is with the lvis_clip_image_embedding.zip and want to confirm whether what I have is correct.

  1. I followed the first command in prepare.sh but turn it into 8 GPU execution: CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 ./tools/dist_train.sh configs/lvis/detpro_ens_20e.py 8 --work-dir workdirs/collect_data --cfg-options model.roi_head.load_feature=False total_epochs=1, which gives me 99342 *.pth files under data/lvis_clip_image_embedding/train2017.
  2. After step 1, I run: zip -r data/lvis_clip_image_embedding.zip data/lvis_clip_image_embedding/* from the root directory and get data/lvis_clip_image_embedding.zip
  3. I run ./tools/dist_train.sh configs/lvis/detpro_ens_20e.py 8 --work-dir workdirs/vild --cfg-options model.roi_head.prompt_path=lvis_clip_text_embedding.pt model.roi_head.load_feature=True as instructed in another post with 8 GPUs. I checked the GPU usage and they are all fully loaded so I didn't think the problem is with GPU utilization. However, the estimated finishing time is > 5 days.

I checked the hierarchy inside lvis_clip_image_embedding.zip and it looks like the internal hierarchy looks like: data/lvis_clip_image_embedding/train2017/000000203466.pth while the zip file is already under ./data. In other words, it seems that a redundant level of data is created inside the zip file. I'm not sure if this expected or not. If not, what is the expected hierarchy inside the zip file?

Thanks for your help!

ZhuoranYu commented 2 years ago

I think I solved the problem now. It turns out that we need to modify the path of loading embeddings in standard_roi_head.py to make it consistent with the path in zip file.

A reference for future users: if your program loads the clip embeddings correctly, there should be no directory called lvis_clip_image_embedding created under data/.

After fixing this, the training estimation time becomes 3 days and half with 8 V100 GPUs...still a bit slower than the estimation provided by the author but seems to be ok