Any scripts or guidance on training

facebookresearch / Sphere

Web-scale retrieval for knowledge-intensive NLP

Other

555 stars 27 forks source link

Any scripts or guidance on training #7

Open mprorock opened 2 years ago

mprorock commented 2 years ago

Hey, this is awesome work!
Are there any tips, scripts or items I should be looking at for training on a separate corpus? Or similarly, any documented methods for adding additional material into the model?

mprorock commented 2 years ago

scratch that - looks like just go ahead and utilize pyserini with or without dense indexes based on desired behavior - is that a correct read? And then utilize DPR as appropriate for bidirectional encodings?

ola13 commented 2 years ago

Hi @mprorock! Yes, our current repo is focused on:

providing access to pre-built indices
serving them using existing infrastructure - Pyserini for the sparse index and distributed-faiss for the dense index

As far as bi-encoder training is concerned, https://github.com/facebookresearch/DPR is a good place to start.