Azure-Samples / azure-search-openai-demo

A sample app for the Retrieval-Augmented Generation pattern running in Azure, using Azure AI Search for retrieval and Azure OpenAI large language models to power ChatGPT-style and Q&A experiences.
https://azure.microsoft.com/products/search
MIT License
6.38k stars 4.26k forks source link

Make Integrated Vectorizer compatible with Azure Data Lake Storage 2 to use the ACL #1744

Open dulalbert opened 5 months ago

dulalbert commented 5 months ago

Please provide us with the following information:

This issue is for a: (mark with an x)

- [ ] bug report -> please search issues before submitting
- [X] feature request
- [ ] documentation issue or request
- [ ] regression (a behavior that used to work and stopped in a new release)

Minimal steps to reproduce

azd env set USE_FEATURE_INT_VECTORIZATION true

Any log messages given by the failure

Expected/desired behavior

In /app/backend/prepdocslib/integratedvectorizerstrategy.py the indexer can only be of with type="azureblob", so it does not work with Azure Data Lake Storage 2. Is it possible to make it compatible with ADLS2 so we can use the set up ACL and EntraID? Thanks

OS and Version?

macOS 14

azd version?

azd version 1.9.3 (commit e1624330dcc7dde440ecc1eda06aac40e68aa0a3)

Versions

2024-05-15

Mention any other details that might be useful


Thanks! We'll be in touch soon.

pamelafox commented 5 months ago

cc @mattgotteiner @srbalakr

I hear two issues here:

  1. The desire to use ADLS2 as the indexer source for integrated vectorization
  2. The desire to use the ACLs from that ADLS2 account with the ACL feature of this repository

I believe neither of those features are supported at this time for integrated vectorization, and would require a change to the default skills, or the creation of custom skills.

I've CCed the search team who can confirm or comment further.

dulalbert commented 5 months ago

@pamelafox For the first issue, it is already available according to the documentation. But yes, for the second point I would need to be able to access to the ACL from ADLS2.

mattgotteiner commented 4 months ago

Yes this is a good question

advanced-flow commented 4 months ago

So did I understand correctly that it is currently not possible to use document level access control with "integrated vectorisation" activated? Because I'm currently trying it out, and I'm wondering if I just haven't implemented something quite right....

DuboisABB commented 4 months ago

+1 on this request. I just spent the last two days scratching my head, wondering why the oids and groups fields were empty in the index. At the very least, please add this current limitation to the documentation.

redur commented 1 month ago

Hi @mattgotteiner,

Will this feature be implemented in the near future?

What are the current limitations when ADLS is used as the source for the indexer? What is the recommended work around to set access control on the index starting from documents in the ADLS storage?

Thanks!

We might be able to make a contribution here if this is something that is possible / desired by you.

mattgotteiner commented 1 month ago

Thanks for letting me know this is a highly desired feature. I will take a look at this again, appreciate the ping