Open freeIsa opened 3 years ago
Hmmm. I am not sure about that. The new VL-pretrainining paper OSCAR shows a strong score on img cap (CIDEr = 140) that you might be interested in.
Hmmm. I am not sure about that. The new VL-pretrainining paper OSCAR shows a strong score on img cap (CIDEr = 140) that you might be interested in.
Thanks for the pointer!
Hi
Hello, I am currently exploring the task of image captioning and I'd like to understand whether/how pretrained LXMERT could be used for such task. As a first attempt, I extracted features from a sample of personal images using the Docker container you provided and running the VQA task on them, by always setting the question to a simple "what is this?". This is the command I used to run the inference:
PYTHONPATH=$PYTHONPATH:./src python src/tasks/vqa.py --test test --loadLXMERTQA snap/pretrained/model
The result was a single, rather reasonable word per image but, obviously, pretty far from a complete caption.
Do you have suggestions how to tackle the image captioning task with LXMERT? Any help/pointer would be much appreciated!
Hi @freeIsa, can you share me /workspace/features/extract_nlvr2_image.py
? I follow this issue#79.
Hello, I am currently exploring the task of image captioning and I'd like to understand whether/how pretrained LXMERT could be used for such task. As a first attempt, I extracted features from a sample of personal images using the Docker container you provided and running the VQA task on them, by always setting the question to a simple "what is this?". This is the command I used to run the inference:
The result was a single, rather reasonable word per image but, obviously, pretty far from a complete caption.
Do you have suggestions how to tackle the image captioning task with LXMERT? Any help/pointer would be much appreciated!