damaggu / TADP

Text-Image Alignment for Diffusion-based Perception (TADP) - CVPR 2024
https://www.vision.caltech.edu/tadp/
Apache License 2.0
23 stars 2 forks source link

Inquiry Regarding Reproducing mIoU Results on VOC2012 Dataset #4

Open master-Shix opened 2 months ago

master-Shix commented 2 months ago

First and foremost, I would like to express my sincere gratitude for your outstanding work. It has been incredibly insightful and inspiring. However, I am currently facing some challenges in reproducing the segmentation results on the VOC2012 dataset as described in your paper.

In particular, I noticed that the highest mIoU result of 87.11 was achieved using the BLIP40 caption. However, my experiments have only yielded an mIoU of 84.66. I would greatly appreciate it if you could provide some additional details regarding the hardware setup you used. Specifically, could you let me know which GPU devices were utilized and whether multi-GPU training was involved?

Furthermore, I would like to confirm if the command used to achieve the 87.11 mIoU was the one provided in the cvpr_experiments section of your repository:

python train_tadp.py --wandb_group pascal_segmentation_clean --exp_name blip_min40 --model TADPSeg --max_epochs 15 --batch_size 2 --val_batch_size 2 --accum_grad_batches 4 --log_every_n_steps 100 --log_freq 100 --text_conditioning blip --use_scaled_encode --blip_caption_path captions/pascal_captions_min=40_max=77.json --debug

Thank you very much for your time and assistance. I look forward to your guidance.

master-Shix commented 2 months ago

Thank you for your quickly reply, but could I ask what is this one,like a exe file?

nkondapa commented 2 months ago

Hi, looks like there's a mistake in the provided command. The --debug flag is still set, which would overwrite the batch size to 1. If you delete that, than the effective batch size would go up to 8. I will update the file in cvpr_experiments. Let me know if this works for you.

master-Shix commented 2 months ago

Thank you for your quick reply! Do you meaning that I need to use python train_tadp.py --wandb_group pascal_segmentation_clean --exp_name blip_min40 --model TADPSeg --maxepochs 15 **--batch_size 2 --val_batchsize 2** --accum_grad_batches 4 --log_every_n_steps 100 --log_freq 100 --text_conditioning blip --use_scaled_encode --blip_caption_path captions/pascal_captions_min=40_max=77.json

or

python train_tadp.py --wandb_group pascal_segmentation_clean --exp_name blip_min40 --model TADPSeg --maxepochs 15 --batch_size 8 --val_batch_size 8_ --accum_grad_batches 4 --log_every_n_steps 100 --log_freq 100 --text_conditioning blip --use_scaled_encode --blip_caption_path captions/pascal_captions_min=40_max=77.json

There doesn't seem to be any other place code to update batch_size from 2 to 8.

master-Shix commented 2 months ago

I'm curious about the potential impact on results when using 4 A6000 GPUs with a batch size of 8. Would you mind sharing your thoughts on this setup? For context, here's the command I used:

CUDA_VISIBLE_DEVICES=0,1,2,3 python train_tadp.py --wandb_group pascal_segmentation_clean --exp_name blip_min40 --model TADPSeg --max_epochs 15 --batch_size 8 --val_batch_size 8 --accum_grad_batches 4 --log_every_n_steps 100 --log_freq 100 --text_conditioning blip --use_scaled_encode --blip_caption_path captions/pascal_captions_min=40_max=77.json --num_gpus 4

I'd appreciate any insights you might have on how this configuration could influence the outcome. Thank you for your time and expertise.

nkondapa commented 2 months ago

Thank you for your quick reply! Do you meaning that I need to use python train_tadp.py --wandb_group pascal_segmentation_clean --exp_name blip_min40 --model TADPSeg --max_epochs 15 ___--batch_size 2 --val_batchsize 2** --accum_grad_batches 4 --log_every_n_steps 100 --log_freq 100 --text_conditioning blip --use_scaled_encode --blip_caption_path captions/pascal_captions_min=40_max=77.json

or

python train_tadp.py --wandb_group pascal_segmentation_clean --exp_name blip_min40 --model TADPSeg --maxepochs 15 --batch_size 8 --val_batch_size 8_ --accum_grad_batches 4 --log_every_n_steps 100 --log_freq 100 --text_conditioning blip --use_scaled_encode --blip_caption_path captions/pascal_captions_min=40_max=77.json

There doesn't seem to be any other place code to update batch_size from 2 to 8.

Just removing the debug flag would result in an effective batch size of 8 (on a single gpu). You can see that the flag --batch_size 2 and --accum_grad_batches 4 -- 2*4 = 8.

nkondapa commented 2 months ago

I'm curious about the potential impact on results when using 4 A6000 GPUs with a batch size of 8. Would you mind sharing your thoughts on this setup? For context, here's the command I used:

CUDA_VISIBLE_DEVICES=0,1,2,3 python train_tadp.py --wandb_group pascal_segmentation_clean --exp_name blip_min40 --model TADPSeg --max_epochs 15 --batch_size 8 --val_batch_size 8 --accum_grad_batches 4 --log_every_n_steps 100 --log_freq 100 --text_conditioning blip --use_scaled_encode --blip_caption_path captions/pascal_captions_min=40_max=77.json --num_gpus 4

I'd appreciate any insights you might have on how this configuration could influence the outcome. Thank you for your time and expertise.

You should probably set accum_grad_batches to 1 so it is not used. The setup looks fine, I would expect the larger batch size to improve the results, but of course I don't know for sure. You may want to scale the lr with the batch size. Also, I think we last tested this code on just 1 GPU, so it may have some problems for the multi-gpu setting. It should be possible to make it work though.

master-Shix commented 1 month ago

Thank you very much for your prompt and helpful reply. I truly appreciate your guidance and support throughout this process. I'm pleased to report some progress following your suggestions. After modifying the --debug parameter, I achieved an mIoU of 86.7. Further adjustments, including setting the batch size to 8 and using accum_grad=1, resulted in a slight improvement to 86.8 mIoU. While these results are encouraging, I'm still striving to reach the 87.11 mIoU reported in the paper. I've also experimented with multi-GPU training using the following configuration:

CUDA_VISIBLE_DEVICES=0,1,2,3 python train_tadp.py --wandb_group pascal_segmentation_clean --exp_name blip_min40 --model TADPSeg --max_epochs 15 --batch_size 8 --val_batch_size 8 --accum_grad_batches 1 --log_every_n_steps 100 --log_freq 100 --text_conditioning blip --use_scaled_encode --blip_caption_path captions/pascal_captions_min=40_max=77.json --num_gpus 4

With this setup and doubling the learning rate, the highest mIoU I've achieved is 86.3. I'm curious about your thoughts on these results. Do you consider this performance within the expected range? I've noticed that the multi-GPU setup hasn't led to improved results as I had anticipated. Are there any potential issues or optimizations you might suggest?

Additionally, I wonder if you have any checkpoint files from your VOC2012 segmentation dataset training that you could share? This could be immensely helpful for benchmarking and troubleshooting.

Once again, thank you for your time and expertise. Your insights are invaluable, and I'm looking forward to your perspective on these findings.