deepset-ai / haystack

:mag: AI orchestration framework to build customizable, production-ready LLM applications. Connect components (models, vector DBs, file converters) to pipelines or agents that can interact with your data. With advanced retrieval methods, it's best suited for building RAG, question answering, semantic search or conversational agent chatbots.
https://haystack.deepset.ai
Apache License 2.0
17.1k stars 1.87k forks source link

synergy jina <> haystack #82

Closed hanxiao closed 2 years ago

hanxiao commented 4 years ago

hi 👋 i just post a thread here to brainstorm potential synergy between https://github.com/jina-ai/jina/ and haystack

tholor commented 4 years ago

Hey @hanxiao ,

Sure, happy to explore some synergies. One idea could be to use combine the QA functionality of haystack with the efficient backend implemented in Jina (incl. DB, pipelines, deployment ...).

Two options come into my mind:

A. Add Jina as an alternative to Elasticsearch in Haystack

  1. Implement a JinaDocumentStore in haystack (to index text documents / embeddings / ...)
  2. Implement a JinaRetriever to find candidate documents via Jinas encoders etc.
  3. Stick it together with haystack's Reader to get a Finder

B. Add Haystack to Jina as "Encoders"

This is less clear to me yet, as I haven't investigated Jina in detail yet. From our discussion, I understood that you would first need to extend the pipeline in Jina to allow an "extra step" after retrieval of our search results that basically executes our Reader to extract the granular span answer. A second modification might be to support two encoders (one for question, one for documents). A rough sketch could be:

  1. Use Haystack model(s) as encoders in Jina (one for questions, one for docs)
  2. Retrieve search results "as usual" via Jina
  3. Add an extra container with one of Haystack's Reader that gets retrieved results and extracts span answer

What do you think?

stale[bot] commented 4 years ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed in 14 days if no further activity occurs.

tholor commented 3 years ago

Let's reconsider a jina documentstore again. Possibly even a "JinaRetriever" later on top.

Some rough steps:

1) Checkout Jina CRUD REST API endpoints: https://api.jina.ai/rest/#operation/search_api_search_post 2) Implement a JinaDocumentStore with the basic methods:

Contributions very welcome :)

hanxiao commented 3 years ago

perfect, let me create a mirror ticket in our repo as well: https://github.com/jina-ai/jina/issues/2128

stale[bot] commented 3 years ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed in 21 days if no further activity occurs.