deepset-ai / haystack

:mag: LLM orchestration framework to build customizable, production-ready LLM applications. Connect components (models, vector DBs, file converters) to pipelines or agents that can interact with your data. With advanced retrieval methods, it's best suited for building RAG, question answering, semantic search or conversational agent chatbots.
https://haystack.deepset.ai
Apache License 2.0
14.52k stars 1.7k forks source link

Documents from DataFrame #7956

Open hiwaveSupport opened 6 days ago

hiwaveSupport commented 6 days ago

Discussed in https://github.com/deepset-ai/haystack/discussions/3873

Originally posted by **stargeysir** January 17, 2023 Hi team, I am currently building a small project by myself where all text data is stored in a DataFrame, not individual documents. Is it possible to to turn rows in a DataFrame into documents as well, so I can use a PreProcessor before updating my InMemoryDocumentStore? I have managed to write to the InMemoryDocumentStore after turning my DataFrame into a list of dictionaries, but it seems I need the Document format first if I wish to use a PreProcessor. Thanks!
anakin87 commented 6 days ago

Currently we are not supporting this feature out of the box, but there is a community project that might be helpful for you.

https://github.com/EdAbati/dataframes-haystack

hiwaveSupport commented 5 days ago

Currently we are not supporting this feature out of the box, but there is a community project that might be helpful for you.

https://github.com/EdAbati/dataframes-haystack

Great - worked for me.