questions about the project, db, chunks size, openai model

danswer-ai / danswer

Gen-AI Chat for Teams - Think ChatGPT if it had access to your team's unique knowledge.

https://danswer.ai

Other

10.84k stars 1.37k forks source link

questions about the project, db, chunks size, openai model #345

Open jignnsd opened 1 year ago

jignnsd commented 1 year ago

Hello, i'm trying the project and I'm curious about what database is it using and where it is deployed at? Many thanks

yuhongsun96 commented 1 year ago

Hi! Postgres (as relational DB) + Vespa (as vector DB). Up until recently though, we used a combination of Qdrant + Typesense instead of Vespa.

Everything is stored on your machine where you deploy Danswer. There are no call-home functionalities (meaning we never send any user data back to us, not even usage or telemetry data).

jignnsd commented 1 year ago

Many thanks @yuhongsun96 , also, is there a way to change the size of the chunks and overlap? Also to change the openai model to use, gpt3.5, gpt3.5-16k, gpt4, etc, how can I do it? Many thanks

yuhongsun96 commented 1 year ago

To change the size of chunks and overlap you'd have to change the values here and build a new container: https://github.com/danswer-ai/danswer/blob/main/backend/danswer/configs/app_configs.py#L139

It's not configurable via environment variables because we don't recommend people mess with it. For example, if you increase the chunk size, you may start losing context in the embeddings because of the model context limit. But feel free to play with it!

For how to configure different models, you can check this: https://docs.danswer.dev/gen_ai_configs/open_ai

jignnsd commented 1 year ago

Perfect @yuhongsun96 I'll read the info

TaridaGeorge commented 1 year ago

Hi! Postgres (as relational DB) + Vespa (as vector DB). Up until recently though, we used a combination of Qdrant + Typesense instead of Vespa.

Everything is stored on your machine where you deploy Danswer. There are no call-home functionalities (meaning we never send any user data back to us, not even usage or telemetry data).

Is there a reason you guys ditched Qdrant instead of vespa? What were the cons and pros?

yuhongsun96 commented 1 year ago

Ya, we went to Vespa because they had features we needed that Qdrant didn't support:

multiple vectors per document
custom scoring functions allowing us to do time related decay, learning from feedback etc.