Muennighoff / sgpt

SGPT: GPT Sentence Embeddings for Semantic Search
https://arxiv.org/abs/2202.08904
MIT License
841 stars 51 forks source link

Evaluating cross encoders #15

Open loopdeloop76 opened 1 year ago

loopdeloop76 commented 1 year ago

Hi @Muennighoff, I would like to use your cross encoder with different GPT models. I have noticed that this script is different from the code in the notebook. Could you explain the difference? Which code should I use, if I want to evaluate cross encoding for different GPT models (e.g. BioGPT)?

Also, do you happen to have the code for running the script in batches, as it is quite slow to predict each query / document pair one by one? Thanks Mark

Muennighoff commented 1 year ago

Hi @Muennighoff, I would like to use your cross encoder with different GPT models. I have noticed that this script is different from the code in the notebook. Could you explain the difference? Which code should I use, if I want to evaluate cross encoding for different GPT models (e.g. BioGPT)?

Also, do you happen to have the code for running the script in batches, as it is quite slow to predict each query / document pair one by one? Thanks Mark

Hey I would recommend adapting this one: https://github.com/Muennighoff/sgpt/blob/main/crossencoder/beir/sgptce.py, which is equivalent to the notebook, as it has batching, but you may want to remove some of the stuff related to evaluation.

Else the script in the README is also good, but you will need to add batching and padding to it.

Feel free to share if you have a nice script!

loopdeloop76 commented 1 year ago

Great, that helps, thank you for your quick response!