LinWeizheDragon / FLMR

The huggingface implementation of Fine-grained Late-interaction Multi-modal Retriever.
41 stars 2 forks source link

关于index_custom_collection方法中的passage_image_feature的获取方式(How to Obtain Pass_image_feature in the Index_custom_collection Method) #20

Closed fengkangjie closed 2 days ago

fengkangjie commented 6 days ago

我有注意到在example_use_custom_functions.py的演示代码中,有使用到passage_image_feature,在实际过程中,这个数据是通过哪种方式获取的?以及相较于直接使用图片哪种方式的文档检索效果更好一些?能给出一些建议么? I noticed that in the demo code of example_use_custom_functions.py, passage_image_feature is used. In practice, how is this data obtained? And which way to retrieve documents is better than using pictures directly? Can you give me some advice? image

LinWeizheDragon commented 6 days ago

It'd be appreciated if you can translate your question into English so that it is also accessible to wider audience.

The passage_image_features can be extracted with the PreFLMR's ViT model and the mapping network. This is because after training the PreFLMR system on Image+Text -> Text, the image representations have been aligned with the text representations. Since the late-interaction design allows parallel representations, one can directly reuse the vision model of PreFLMR to extract image features for passages and include them in the retrieval process. Naturally, the query image and the passage image will contribute a high similarity scores to the final score if they are similar (as they are derived by the same vision model). This enables Image+Text -> Image+Text retrieval. You can directly use Option 3 in the script which extracts features from the given images automatically using the PreFLMR's vision model. In this case, you don't need to extract the features on your own.

Of course, reusing the vision model for encoding passages' images is a good feature but may be suboptimal. Consider training the system on Image+Text -> Image+Text tasks to achieve optimal performance.

fengkangjie commented 5 days ago

sorry for that,I have translated the title and comments into English,Thank you very much for your answer.

fengkangjie commented 5 days ago

Another question is, do I need to change the Chinese tonkenizer to improve recall accuracy when encoding Chinese queries and documents? Or use the default tonkenizer can also get good results.

LinWeizheDragon commented 5 days ago

The base text encoder is a BERT-base-uncased. So it can handle some simple Chinese characters. The model was not pre-trained on Chinese corpus, so you may need to fine-tune it to achieve the optimal retrieval performance in Chinese. A quick workaround is to translate your query and documents into English using a translator. We may train a multilingual version of PreFLMR and share it later.