danswer-ai / danswer

Gen-AI Chat for Teams - Think ChatGPT if it had access to your team's unique knowledge.
https://docs.danswer.dev/
Other
10.36k stars 1.24k forks source link

General issue deleted pages still occur - tested with Confluence integration: Deleted pages in confluence still indexed (even after full re-index) #1222

Open vchaindz opened 6 months ago

vchaindz commented 6 months ago

Hi,

Behavior:

Expected behavior:

if a page gets deleted in confluence any update on the connector (automatic, manual update and manual full index) should delete the corresponding indexed document

vchaindz commented 6 months ago

additional information - the document is deleted in Vespa index; but the chat and document explorer responses still show deleted content. I think that is a general issue, not related to any integration. I still see entries in Postgres. Maybe this needs an update to clean up postgres as well? https://github.com/danswer-ai/danswer/issues/938 https://github.com/danswer-ai/danswer/pull/1086

mboret commented 5 months ago

additional information - Even after the connector deletion, the chat/search/slack bot can answer based on documents indexed by the deleted connector. IMO it's a big issue as even when we improve/clean our source documentation, Danswer doesn't reflect it. @Weves @yuhongsun96

Weves commented 5 months ago

@mboret is there a way for us to reproduce this? That definitely is not intended—once a connector is deleted none of its documents should be searchable.