ananthu-aniraj / pdiscoformer

[ECCV 2024 Oral] Official implementation of the paper "PDiscoFormer: Relaxing Part Discovery Constraints with Vision Transformers"
MIT License
7 stars 1 forks source link

Unable to Reproduce Results on Flowers102 and PartImageNet OOD #1

Closed tripleh1 closed 2 months ago

tripleh1 commented 2 months ago

Hi,

First of all, thank you for this research—I'm really enjoying working with it.

I successfully reproduced the results on the CUB dataset, but I am facing issues with reproducing the results on Flowers102 and PartImageNet OOD datasets. Could you please provide an example command that works for these datasets?

For Flowers102, I used the following command with a single GPU and a batch size of 64: python train_net.py \ --model_arch vit_base_patch14_reg4_dinov2.lvd142m \ --pretrained_start_weights \ --data_path <data_path>/oxford_flower \ --batch_size 64 \ --epochs 28 \ --dataset flowers102 \ --save_every_n_epochs 16 \ --num_workers 2 \ --image_sub_path_train images \ --image_sub_path_test images \ --train_split 1 \ --eval_mode test \ --wandb_project Flowers \ --job_type Flowers \ --group Flowers \ --snapshot_dir Flowers \ --lr 2e-6 \ --optimizer_type adam \ --scheduler_type steplr \ --scheduler_gamma 0.5 \ --scheduler_step_size 4 \ --scratch_lr_factor 1e4 \ --modulation_lr_factor 1e4 \ --finer_lr_factor 1e3 \ --drop_path 0.0 \ --smoothing 0 \ --augmentations_to_use cub_original \ --image_size 224 \ --num_parts 4 \ --weight_decay 0 \ --total_variation_loss 1.0 \ --concentration_loss 0.0 \ --enforced_presence_loss 2 \ --enforced_presence_loss_type enforced_presence \ --pixel_wise_entropy_loss 1.0 \ --gumbel_softmax \ --freeze_backbone \ --presence_loss_type original \ --modulation_type layer_norm \ --modulation_orth \ --grad_norm_clip 2.0 For PartImageNet OOD, I used the following command:

python train_net.py \ --model_arch vit_base_patch14_reg4_dinov2.lvd142m \ --pretrained_start_weights \ --data_path <data_path>/PartImageNet_OOD \ --batch_size 64 \ --epochs 28 \ --dataset part_imagenet \ --save_every_n_epochs 16 \ --num_workers 2 \ --image_sub_path_train train \ --image_sub_path_test train \ --anno_path_train <data_path>/PartImageNet/train_train.json \ --anno_path_test <data_path>/PartImageNet/train_test.json \ --train_split 1 \ --eval_mode test \ --wandb_project PartImageNet_OOD_K25 \ --job_type PartImageNet_OOD_K25 \ --group PartImageNet_OOD_K25 \ --snapshot_dir PartImageNet_OOD \ --lr 2e-6 \ --optimizer_type adam \ --scheduler_type steplr \ --scheduler_gamma 0.5 \ --scheduler_step_size 4 \ --scratch_lr_factor 1e4 \ --modulation_lr_factor 1e4 \ --finer_lr_factor 1e3 \ --drop_path 0.0 \ --smoothing 0 \ --augmentations_to_use cub_original \ --image_size 224 \ --num_parts 25 \ --weight_decay 0 \ --total_variation_loss 1.0 \ --concentration_loss 0.0 \ --enforced_presence_loss 2 \ --enforced_presence_loss_type enforced_presence \ --pixel_wise_entropy_loss 1.0 \ --gumbel_softmax \ --freeze_backbone \ --presence_loss_type original \ --modulation_type layer_norm \ --modulation_orth \ --grad_norm_clip

Could you please review this command and let me know if I missed anything important? Additionally, any insights or recommendations for the PartImageNet OOD would be highly appreciated.

Thank you!

ananthu-aniraj commented 2 months ago

Hi, thanks for raising the issue.

The only difference in terms of training hyper-parameters for training on CUB (or NABirds) vs the other datasets is the input image size which changes from (518, 518) to (224, 224). From what I can see, you have already made this change in your provided commands.

Can you please let me know exactly how much deviation you see with the results or if there are other issues with running the code?

In my own experiments I used a batch size of 128 for these two datasets (with learning rate set to 2.828e-6). If possible can you try that as well?

ananthu-aniraj commented 2 months ago

Hi, just an update, I think the issue is with the batch size.
Batch sizes of 128 and above should work (and have been tested).

If you're unable to do that, an alternative is to set the weight decay to a value of 0.05 (from the AdamW paper). I'll update the training instructions to include this. Thanks for letting me know about the issue!

By the way the "--dataset" flag in your partimagenet ood training should be set to "part_imagenet_ood". I have tested batch sizes of 128 and up for this dataset as well.

Let me know if this helps! In case you just want to use the models as is, use the torch hub code in the model zoo file.

ananthu-aniraj commented 2 months ago

I've added new instructions for training per dataset and included information about recommended batch sizes.

Feel free to re-open this issue if you still need help!