danswer-ai / danswer

Gen-AI Chat for Teams - Think ChatGPT if it had access to your team's unique knowledge.
https://danswer.ai
Other
10.81k stars 1.36k forks source link

No indexing in batch when using ingestion api #2760

Open JCTaz9 opened 1 month ago

JCTaz9 commented 1 month ago

We use the indexing API to embed a lot of documents with a structure like in the documentation (by section). https://docs.danswer.dev/backend_apis/ingestion It takes a very long time to embed all of our documents because for every document, it's making one api call. Would it be possible to embed in batch when using the ingestion api?

Right now it takes 13 days embedding what would take a few hours with the files connectors.

rkuo-danswer commented 1 month ago

Hi, this is definitely something we are looking at refactoring as running the ingestion inside the endpoint doesn't match with background processing system. Have you looked into using the file connector?

Tracking refactoring of this as DAN-867.