facebookresearch / ParlAI

A framework for training and evaluating AI models on a variety of openly available dialogue datasets.
https://parl.ai
MIT License
10.49k stars 2.1k forks source link

Searching Alternative Databases/Custom Functions #4381

Closed TKTSWalker closed 2 years ago

TKTSWalker commented 2 years ago

Hello, I would like to add ways in which Blenderbot can check data using custom made functions through different files such as txt, xslx and such. Formatting it and getting it to "understand" the data returned should be easy, but the hard part actually comes from initiating the search as well as making it local. Can someone point me in the right direction?

klshuster commented 2 years ago

I'm not sure I understand the question - are you saying you want to be able to retrieve over a local set of documents?

TKTSWalker commented 2 years ago

Pretty much!

On Fri, Feb 25, 2022 at 12:08 PM Kurt Shuster @.***> wrote:

I'm not sure I understand the question - are you saying you want to be able to retrieve over a local set of documents?

— Reply to this email directly, view it on GitHub https://github.com/facebookresearch/ParlAI/issues/4381#issuecomment-1051033935, or unsubscribe https://github.com/notifications/unsubscribe-auth/ANRX3WREVEC3EMW3WVCA67DU46ZRTANCNFSM5PKHFRLA . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.

You are receiving this because you authored the thread.Message ID: @.***>

klshuster commented 2 years ago

You can look at the rag and fid models for that.

We have implemented several retrieval methods for these models:

  1. FAISS-based: retrieve from a FAISS index (an index of dense embeddings of the documents)
  2. Search-based: query a search server
  3. TFIDF: retrieve from a TFIDF database

If any of those options fit, you can use --rag-retriever-type <dpr/search_engine/tfidf>. Instructions for how to build a FAISS index are in the rag link above; instructions for how to use TFIDF can be found here (and you would then specify --tfidf-model-path to be your TFIDF index). Instructions for how to use a search engine can be found by searching through the ParlAI issues.

If you want to implement your own search method, you can subclass the RagRetriever and implement a retrieve_and_score method. Examples in that file show how other methods are implemented.

TKTSWalker commented 2 years ago

Thank you! This is exactly what I was looking for!

On Fri, Feb 25, 2022 at 12:34 PM Kurt Shuster @.***> wrote:

You can look at the rag https://github.com/facebookresearch/ParlAI/tree/main/parlai/agents/rag and fid https://github.com/facebookresearch/ParlAI/tree/main/parlai/agents/fid models for that.

We have implemented several retrieval methods for these models:

  1. FAISS-based: retrieve from a FAISS index (an index of dense embeddings of the documents)
  2. Search-based: query a search server
  3. TFIDF: retrieve from a TFIDF database

If any of those options fit, you can use --rag-retriever-type <dpr/search_engine/tfidf>. Instructions for how to build a FAISS index are in the rag link above; instructions for how to use TFIDF can be found here https://github.com/facebookresearch/ParlAI/tree/main/parlai/agents/tfidf_retriever (and you would then specify --tfidf-model-path to be your TFIDF index). Instructions for how to use a search engine can be found by searching through the ParlAI issues.

If you want to implement your own search method, you can subclass the RagRetriever https://github.com/facebookresearch/ParlAI/blob/7c2b199d0b315c9016072897d849811cfc8a5073/parlai/agents/rag/retrievers.py#L363 and implement a retrieve_and_score method. Examples in that file show how other methods are implemented.

— Reply to this email directly, view it on GitHub https://github.com/facebookresearch/ParlAI/issues/4381#issuecomment-1051054148, or unsubscribe https://github.com/notifications/unsubscribe-auth/ANRX3WRYSZTF4JTSOMJLOULU464TTANCNFSM5PKHFRLA . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.

You are receiving this because you authored the thread.Message ID: @.***>