Azure / gpt-rag-ingestion

MIT License
52 stars 48 forks source link

Document deletion/update flow automation #54

Open Foco22 opened 9 months ago

Foco22 commented 9 months ago

Hi,

please add an automation for when a file is deleted or modified. it should delete the records from both indexers and the chunks in the storage account regarding that file so the solution dont give answers coming from deleted or outdated data.

Index and chunks:

index MicrosoftTeams-image (8)

Best Regards

framigni commented 7 months ago

I second this request. It's an essential requirement that the documentation is kept dynamically updated and the chat answers are not "polluted" by obsolete information. Destroying and re-creating the Index and Indexer at any update or deletion is not practical. My understanding this is a limitation of the Azure Search service, so probably a bigger issue Best Regards

gbecerra1982 commented 7 months ago

We are already working on this feature it will be available soon.

framigni commented 7 months ago

@gbecerra1982 @Foco22 I think actually there is a solution already, and it works for me: First, you need to Enable soft delete for blobs (as described in https://learn.microsoft.com/en-us/azure/storage/blobs/soft-delete-blob-enable?tabs=azure-portal) on the Storage Account where is your document container Then, you need to Configure native soft delete (as described in https://learn.microsoft.com/en-us/azure/search/search-howto-index-changed-deleted-blobs?tabs=portal) in the Search service, Data Source settings After that, once a document has been deleted from the document container, all the knowledge of that document is removed as well at the next Indexer run (and document_chunking execution) I guess that only thing now would be to embed the 2 settings into the package/script for deployment