LinWeizheDragon / FLMR

The huggingface implementation of Fine-grained Late-interaction Multi-modal Retriever.
65 stars 4 forks source link

结果复现问题 #26

Closed xiewen354 closed 1 month ago

xiewen354 commented 1 month ago

Hello. I ran example_use_preflmr.py for EVQA according to your github code and huggingface's checkpoint, but I couldn't reproduce the PR@5=73.1 you reported in your paper. I used the parameters you mentioned on GitHub, why is that? Do I need to change the code or parameters? Below I run out the results and parameters 你好。 我根据你在github的代码和huggingface的checkpoint,针对EVQA跑了example_use_preflmr.py,但是我没能复现出你在论文中报告的PR@5=73.1.我使用的参数就是你在GitHub提到的,这是什么原因呢?是不是我需要修改代码或者参数?,下面的我跑出的结果和参数

EVQA /nas-alinlp/xiewen.xie/models/LinWeizheDragon/PreFLMR_ViT-G Total number of questions: 3750 Pseudo Recall@1: 0.5096 Pseudo Recall@5: 0.7194666666666667 Pseudo Recall@10: 0.7861333333333334 Pseudo Recall@20: 0.8341333333333333 Pseudo Recall@50: 0.8848 Pseudo Recall@100: 0.9090666666666667 Pseudo Recall@500: 0.9349333333333333 Recall@1: 0.40266666666667 Recall@5: 0.624 Recall@10: 0.703733333333333333 Recall@20: 0.7805333333333333 Recall@50: 0.8616 Recall@100: 0.910933333333333334 Recall@500: 0.9621333333333333

source activate /nas-alinlp/xiewen.xie/envs/FLMR python example_use_preflmr.py \ --use_gpu --run_indexing \ --index_root_path "." \ --index_name EVQA_PreFLMR_ViT-G \ --experiment_name EVQA \ --indexing_batch_size 64 \ --image_root_dir /nas-alinlp/xiewen.xie/DATASETS/EVQA_img \ --dataset_hf_path /nas-alinlp/xiewen.xie/DATASETS/BByrneLab/multi_task_multi_modal_knowledge_retrieval_benchmark_M2KR \ --dataset EVQA \ --use_split test \ --nbits 8 \ --Ks 1 5 10 20 50 100 500 \ --checkpoint_path /nas-alinlp/xiewen.xie/models/LinWeizheDragon/PreFLMR_ViT-G \ --image_processor_name /nas-alinlp/xiewen.xie/models/laion/CLIP-ViT-bigG-14-laion2B-39B-b160k \ --query_batch_size 8 \ --compute_pseudo_recall \

LinWeizheDragon commented 1 month ago

Hi, the pre-trained checkpoint achieves 0.721 on our side - there might be some very small differences with the number in the paper due to the checkpoint conversion from Pytorch to Huggingface after pre-training. See the note herre.

However, you can still achieve the same/even superior fine-tuning performance (74.21) on EVQA using the finetuning code we provided here.

xiewen354 commented 1 month ago

thanks