CrossmodalGroup / LAPS

Linguistic-Aware Patch Slimming Framework for Fine-grained Cross-Modal Alignment, CVPR, 2024
82 stars 8 forks source link

Can not to reproduce the results #2

Open tianbao-anw opened 3 months ago

tianbao-anw commented 3 months ago

I installed the command in README.md to train this code on multiple devices, but none of them could achieve the results described in the paper.

On a single 3090 GPU, the best results are: Image to text (R@1, R@5, R@10): 71.9, 93.2, 96.5 Text to image (R@1, R@5, R@10): 60.7, 87.2, 92.7 rsum is 502.2

On a dual 3090 GPU, the best results are: Image to text (R@1, R@5, R@10): 71.5, 93.1, 96.4 Text to image (R@1, R@5, R@10): 60.6, 87.3, 92.6 rsum is 501.5

The log file is in the attachment. log.txt

tianbao-anw commented 3 months ago

Why is the performance so bad? Is there something wrong with the configuration?

darkpromise98 commented 3 months ago

Based on my observation, the batch size has a significant impact on the results, a larger batch size tends to yield better results. Therefore, you can try to set the batch size=128 (the same as our provided training logs https://drive.google.com/drive/folders/1m3Y9TMkas2efSbeDV_ESci6uwMGd3MUY)

Besides, the results on Flickr30K are not very stable, please try a few more times.

darkpromise98 commented 3 months ago

the batch size=128 may be difficult on a single 3090 GPU because of the memory limitation.

tianbao-anw commented 3 months ago

Thanks for the quick reply, but when I train with 2 GPUs, the results are almost the same.

On a dual 3090 GPU, the best results are: Image to text (R@1, R@5, R@10): 71.5, 93.1, 96.4 Text to image (R@1, R@5, R@10): 60.6, 87.3, 92.6 rsum is 501.5

I trained this code by following the instructions in the README.md, using the command as follows:

CUDA_VISIBLE_DEVICES=0,1 python -m torch.distributed.run --nproc_per_node=2 train.py --dataset f30k --multi_gpu 1 --logger_name runs/f30k_vit --batch_size 64 --vit_type vit --embed_size 512 --sparse_ratio 0.5 --aggr_ratio 0.4

it doesn't work

darkpromise98 commented 3 months ago

The results of multiple GPUs may be different from the single GPU (even if the batch size is the same), I will check the results recently.