Closed lzhptr closed 1 month ago
When directly computing the exact similarity between query and doc (e.g. in training), all the tokens participate in computation. The similarity is sum(MaxSim(q, d)) where q,d are late interaction embeddings of query and doc. See the model's forward function. In retrieval, the engine does an approximate search. For every query token, it searches the most similar token embedding in the corpus and sums up the similarity. So only the retrieved doc token embeddings are incorporated in the returned similarity score. Of course, you can retrieve the exact similarity score using the original doc embeddings after index retrieval.
When directly computing the exact similarity between query and doc (e.g. in training), all the tokens participate in computation. The similarity is sum(MaxSim(q, d)) where q,d are late interaction embeddings of query and doc. See the model's forward function. In retrieval, the engine does an approximate search. For every query token, it searches the most similar token embedding in the corpus and sums up the similarity. So only the retrieved doc token embeddings are incorporated in the returned similarity score. Of course, you can retrieve the exact similarity score using the original doc embeddings after index retrieval.
Thank you very much! What are the number of tokens for text queries, images and documents in the preflmr model? Thanks
When directly computing the exact similarity between query and doc (e.g. in training), all the tokens participate in computation. The similarity is sum(MaxSim(q, d)) where q,d are late interaction embeddings of query and doc. See the model's forward function. In retrieval, the engine does an approximate search. For every query token, it searches the most similar token embedding in the corpus and sums up the similarity. So only the retrieved doc token embeddings are incorporated in the returned similarity score. Of course, you can retrieve the exact similarity score using the original doc embeddings after index retrieval.
Thank you very much! What are the number of tokens for text queries, images and documents in the preflmr model? Thanks
the token numbers of 1 、2、3 and 4
32, 32, 32*num_patch_embeddings (depending on the vision encoder), 512
32, 32, 32*num_patch_embeddings (depending on the vision encoder), 512
Thank u
Does preflmr use all the tokens in a document when calculating similarity? Thank you!