Azure-Samples / azure-search-openai-demo

A sample app for the Retrieval-Augmented Generation pattern running in Azure, using Azure AI Search for retrieval and Azure OpenAI large language models to power ChatGPT-style and Q&A experiences.
https://azure.microsoft.com/products/search
MIT License
6.15k stars 4.18k forks source link

support for this repo in Azure Government Cloud? #1475

Open delbert opened 7 months ago

delbert commented 7 months ago

hello Everyone,

does anyone know if there is an existing fork of this repo that runs in Azure Government Cloud?

if not, does anyone know of any issues with the underlying services that would prevent me from doing such a port, as long as i change over the referenced endpoints and Client mechanisms, such as using:

SearchEndpointClient( endpoint=, credential=, audience=AzureAuthorityHosts.AZURE_GOVERNMENT)

thanks in advance for any help or pointers

danieljurek commented 7 months ago

~@delbert -- No fork necessary. Well-known sovereign clouds are already supported. https://learn.microsoft.com/en-us/azure/developer/azure-developer-cli/sovereign-clouds~

~Please give it a try and let us know if you have any feedback!~

I misread the context on this issue. Let me take a look at this specific template.

delbert commented 7 months ago

~@delbert -- No fork necessary. Well-known sovereign clouds are already supported. https://learn.microsoft.com/en-us/azure/developer/azure-developer-cli/sovereign-clouds~

~Please give it a try and let us know if you have any feedback!~

I misread the context on this issue. Let me take a look at this specific template.

thanks, @danieljurek! i am specifically looking at code in prepdocs.py, searchmanager.py, and strategy.py -- trying to ingest some sample docs with a Azure OpenAI, AI Search, blob storage, all in Gov Cloud

pamelafox commented 7 months ago

I am not aware of issues, but am not positive if it has been deployed to the Government Cloud. I did look into removing the hard-coded URLs in our Bicep files, but ran into an issue where not all government URLs were available programmatically: https://github.com/Azure/bicep/issues/12482 But that shouldn't affect you since you know you only want it working for Cloud. Please report back if you try it and discover any issues.

delbert commented 7 months ago

thanks, @pamelafox. curiouser and curiouser. you start pulling on a thread and its amazing what comes out of the tapestry at you.
we will definitely report back what we find. we have started by manually deploying the services via the portal in gov cloud (with private endpoints and policy and rbac restrictions in place) and then are running the prepdocs sequence and deploying the chat app. hoping for a chat-on-data solution relatively quickly. we are also trying the template as is, in a gov cloud sandbox subscription with no restrictions on public ip, policy, and rbac to see how that goes.

pamelafox commented 7 months ago

There's also the integrated vectorization approach that you can enable in this repository, versus running the local script. The search team says that Integrated Vectorization is compatible with government cloud, as long as you have an OpenAI in that cloud.

delbert commented 7 months ago

thanks! super-helpful, at this point. i'll give it a try

mattgotteiner commented 7 months ago

I am wondering if the hardcoded DNS zones (e.g. search.windows.net) would interfere with deploying to a different cloud

delbert commented 7 months ago

thanks! super-helpful, at this point. i'll give it a try

unfortunately, this limitation will disqualify Integrated Vectorization for almost any Gov Cloud implementation:

Shared private link connections to a vectorizer is not supported

delbert commented 7 months ago

I am wondering if the hardcoded DNS zones (e.g. search.windows.net) would interfere with deploying to a different cloud

yes it would. in my case, though, the customer uses their own DNS service, not Azure's

pamelafox commented 6 months ago

@delbert Were you ever able to make progress on this? We've had other developers inquire.