Training data for Cross-Encoder model

facebookresearch / BLINK

Entity Linker solution

MIT License

1.17k stars 231 forks source link

Training data for Cross-Encoder model #92

Open shzamanirad opened 3 years ago

shzamanirad commented 3 years ago

Hi,

It is mentioned in the paper that:

We train our cross-encoder model based on the top 100 retrieved results from our bi-encoder model on Wikipedia data. For the training of the cross-encoder model, we further down-sample our training data to obtain a training set of 1M examples.

Can you please provide this training data for cross-encoder?

Thanks.

abhinavkulkarni commented 2 years ago

@shzamanirad: The training data for crossencoder is output by eval_biencoder.py script. For every datapoint in train/test/valid split, the eval script basically outputs top 64 retrieved candidates and calculates recall at various positions (recall@1, @10, @64, etc.).

The top 64 retrieved candidates are further used by the corssencoder as train/test/valid data.

atulbunkar commented 1 year ago

Hi , I had a doubt, If we train the cross encoder model with 64 candidates , can we eval the model with 20 or 30 candidates ?