Closed Maxlinn closed 8 months ago
Hi, thanks for your interest in our work. Re your questions:
"title:", example['title'], "content:", example['text']
and did not do chunking. We left it long since we also wanted to facilitate possible future extensions to the length of passages. You can of course chunk the passages with any techniques you like. Just remember to make sure the comparison is fair (e.g. comparing models on the same chunked passage pool).Question: ... Caption: ... Objects: ... Knowledge: ... Answer:
Specifically, the caption is extracted using BLIP 2 image captioning and objects are extracted using VinVL.Hi, thanks for your interest in our work. Re your questions:
1. We simply formatted the passage text as `"title:", example['title'], "content:", example['text']` and did not do chunking. We left it long since we also wanted to facilitate possible future extensions to the length of passages. You can of course chunk the passages with any techniques you like. Just remember to make sure the comparison is fair (e.g. comparing models on the same chunked passage pool). 2. In answer generation (for Infoseek), we use the same BLIP 2 model as in FLMR: `Question: ... Caption: ... Objects: ... Knowledge: ... Answer:` Specifically, the caption is extracted using BLIP 2 image captioning and objects are extracted using VinVL. 3. Yes, we resplit the validation set of Infoseek and created a val set and a test set for the M2KR benchmark. The reported result is for the M2KR test split (which is subsampled from the original validation set).
Much appreciation to your timely and detailed response, i have much better understanding now!
About bullet 2, i have one more tiny question. To my best guess, "Knowledge:" should be followed by the retrieved passage. But the wikipedia articles in knowledge base of infoseek are about thousands of words, which seems impossible to fit the context length of BLIP2 without chunking, could you please throw more light on this?
Thanks for your patience!
Hi, thanks for your interest in our work. Re your questions:
1. We simply formatted the passage text as `"title:", example['title'], "content:", example['text']` and did not do chunking. We left it long since we also wanted to facilitate possible future extensions to the length of passages. You can of course chunk the passages with any techniques you like. Just remember to make sure the comparison is fair (e.g. comparing models on the same chunked passage pool). 2. In answer generation (for Infoseek), we use the same BLIP 2 model as in FLMR: `Question: ... Caption: ... Objects: ... Knowledge: ... Answer:` Specifically, the caption is extracted using BLIP 2 image captioning and objects are extracted using VinVL. 3. Yes, we resplit the validation set of Infoseek and created a val set and a test set for the M2KR benchmark. The reported result is for the M2KR test split (which is subsampled from the original validation set).
Much appreciation to your timely and detailed response, i have much better understanding now!
About bullet 2, i have one more tiny question. To my best guess, "Knowledge:" should be followed by the retrieved passage. But the wikipedia articles in knowledge base of infoseek are about thousands of words, which seems impossible to fit the context length of BLIP2 without chunking, could you please throw more light on this?
Thanks for your patience!
Hi, your concern is correct. We did not truncate the passages when generating the answer. This could potentially run out of the allowed tokens. This is also why we put knowledge after other useful information such as captions to avoid truncation.
But anyway, our purpose is to show that with augmented knowledge, models can get boosted performance easily. It is highly recommended that you take more careful steps in processing if you want to delve into the VQA ability, and better ultimate performance is 100% guaranteed. An easy approach would be splitting the passages into chunks, as mentioned in your previous post, or replacing the BLIP 2 with more advanced LMMs that allow more tokens. PreFLMR can be integrated with any answer generators.
Thanks again for timely response! All my questions have been addressed!
Hi lin, a series of your work(RA-VQA, FLMR, PreFLMR) has made great contributions to the field of KB-VQA, which is really impressive!
Recently i have special interest in InfoSeek task, after reading with due care, i still have two questions about the details, i wonder if you could generously help:
Thanks in advance!