airsplay / lxmert

PyTorch code for EMNLP 2019 paper "LXMERT: Learning Cross-Modality Encoder Representations from Transformers".
MIT License
935 stars 158 forks source link

Using LXMERT for image captioning #86

Open freeIsa opened 3 years ago

freeIsa commented 3 years ago

Hello, I am currently exploring the task of image captioning and I'd like to understand whether/how pretrained LXMERT could be used for such task. As a first attempt, I extracted features from a sample of personal images using the Docker container you provided and running the VQA task on them, by always setting the question to a simple "what is this?". This is the command I used to run the inference:

PYTHONPATH=$PYTHONPATH:./src python src/tasks/vqa.py --test test --loadLXMERTQA snap/pretrained/model

The result was a single, rather reasonable word per image but, obviously, pretty far from a complete caption.

Do you have suggestions how to tackle the image captioning task with LXMERT? Any help/pointer would be much appreciated!

airsplay commented 3 years ago

Hmmm. I am not sure about that. The new VL-pretrainining paper OSCAR shows a strong score on img cap (CIDEr = 140) that you might be interested in.

freeIsa commented 3 years ago

Hmmm. I am not sure about that. The new VL-pretrainining paper OSCAR shows a strong score on img cap (CIDEr = 140) that you might be interested in.

Thanks for the pointer!

yezhengli-Mr9 commented 3 years ago

Hi

Hello, I am currently exploring the task of image captioning and I'd like to understand whether/how pretrained LXMERT could be used for such task. As a first attempt, I extracted features from a sample of personal images using the Docker container you provided and running the VQA task on them, by always setting the question to a simple "what is this?". This is the command I used to run the inference:

PYTHONPATH=$PYTHONPATH:./src python src/tasks/vqa.py --test test --loadLXMERTQA snap/pretrained/model

The result was a single, rather reasonable word per image but, obviously, pretty far from a complete caption.

Do you have suggestions how to tackle the image captioning task with LXMERT? Any help/pointer would be much appreciated!

Hi @freeIsa, can you share me /workspace/features/extract_nlvr2_image.py? I follow this issue#79.