Azure-Samples / azure-search-openai-demo

A sample app for the Retrieval-Augmented Generation pattern running in Azure, using Azure AI Search for retrieval and Azure OpenAI large language models to power ChatGPT-style and Q&A experiences.
https://azure.microsoft.com/products/search
MIT License
5.93k stars 4.08k forks source link

Shelf life of data when using upload function - auto delete of data after x days #1783

Open RobSch1406 opened 2 months ago

RobSch1406 commented 2 months ago

e.g. : vector_store = client.beta.vector_stores.create_and_poll( name="Product Documentation", file_ids=['file_1', 'file_2', 'file_3', 'file_4', 'file_5'], expires_after={ "anchor": "last_active_at", "days": 7 } )



this is the link: 
https://learn.microsoft.com/en-us/azure/ai-services/openai/how-to/file-search?tabs=python

have you already tried it to implement this into the current code? 

Where to start? 
Thanks and best regards
pamelafox commented 2 months ago

I have not tried to implement such a feature, no. This is the first time it's been requested.

To implement it, we need a few things:

  1. A way of marking the expiration time of a file (if it's not going to be a global expiration). We could potentially store that in the metadata of the blob, I suppose.
  2. A way of querying for expired files. Perhaps using https://learn.microsoft.com/en-us/azure/storage/blobs/storage-manage-find-blobs?tabs=azure-portal if we're using Blob metadata. Or we have to do a brute-force search over every blob and decide.
  3. A cron job for deleting expired data. Since we're hosted on AppService, that'd be via WebJobs: https://learn.microsoft.com/en-us/azure/storage/blobs/storage-manage-find-blobs?tabs=azure-portal

So I think it is doable, but is a decent amount of work. I likely will not be implementing it anytime soon, as there are other feature requests that are more commonly requested. If you do implement it, we'd love to see it in a branch or PR.