devaansh100 / CLIPTrans

Official implementation for the paper "Transferring Visual Knowledge with Pre-trained Models for Multimodal Machine Translation", published at ICCV'23.
19 stars 3 forks source link

Pre-trained Model generated language code id #1

Closed ZhishenYang closed 9 months ago

ZhishenYang commented 10 months ago

Hi Devaansh,

Thank you for open-sourcing the code. I am using the provided pre-trained model to replicate experimental results on the WIT dataset DE-ES.

The model outputs only language code IDs. Could you point out possible wrong implementations? Thank you.

Command used: python3 src/main.py \ --num_gpus 1 \ --mn wit_inference \ --ds wit \ --src_lang de \ --tgt_lang es \ --prefix_length 10 \ --bs 1 \ --test_ds test \ --stage translate \ --test \ --lm model_best_test.pth \

Source sentence: Joaquín Sabina (2007)

After tokenization: {'input_ids': tensor([[250003, 2177, 74688, 19, 35477, 76, 97666, 2]], device='cuda:0'), 'attention_mask': tensor([[1, 1, 1, 1, 1, 1, 1, 1]], device='cuda:0'), 'labels': tensor([[250005, 2177, 74688, 19, 35477, 76, 22, 21, 8002, 399, 146, 78374, 8, 8884, 22, 25499, 2]], device='cuda:0'), 'mask_decoder_input_ids': tensor([[True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True]], device='cuda:0')}

Model outputs tensor([[ 2, 250005, 250005, 250005, 250005, 250005, 250005, 250005, 250005, 250005, 250005, 250005, 250005, 250005, 250005, 250005, 250005, 250005, 250005, 250005, 250005, 250005, 250005, 250005, 250005, 250005, 250005, 250005, 250005, 250005, 250005, 250005, 250005, 250005, 250005, 250005, 250005, 250005, 250005, 250005, 250005, 250005, 250005, 250005, 250005, 250005, 250005, 250005, 250005, 250005, 250005, 250005, 250005, 250005, 250005, 250005, 250005, 250005, 250005, 250005, 2]], device='cuda:0')

ZhishenYang commented 9 months ago

Installed wrong transformer library.