danswer-ai / danswer

Gen-AI Chat for Teams - Think ChatGPT if it had access to your team's unique knowledge.
https://docs.danswer.dev/
Other
10.23k stars 1.22k forks source link

Custom chunking strategy (splitting characters) #1183

Open Pasmikh opened 6 months ago

Pasmikh commented 6 months ago

I am trying to build chatbot based on FAQ documentation. It uses text file as a list of question-answer pairs. However, base chunking strategy sometimes splits chunks in the middle of an answer or between question and answer.

It seriously undermines quality of answers. Is there any way to customise chunking strategy so I can make sure questions AND answer appear in the same chunk fully?

What comes to my mind is some special character that indicates chunk split like endoftext or smth.

LrWm3 commented 6 months ago

Chunking based on h1, h2 etc might be possible but idk if it's implemented