Azure-Samples / azure-search-openai-demo

A sample app for the Retrieval-Augmented Generation pattern running in Azure, using Azure AI Search for retrieval and Azure OpenAI large language models to power ChatGPT-style and Q&A experiences.
https://azure.microsoft.com/products/search
MIT License
6.38k stars 4.26k forks source link

Principal does not have access to API/Operation #1837

Open thracian2015 opened 4 months ago

thracian2015 commented 4 months ago

Please provide us with the following information:

This issue is for a: (mark with an x)

- [ x] bug report -> please search issues before submitting
- [ ] feature request
- [ ] documentation issue or request
- [ ] regression (a behavior that used to work and stopped in a new release)

Minimal steps to reproduce

  1. I'm running the app locally. I was able to install and verify all services except document intelligence which fails to load the files.
  2. To isolate and repeat the issue, I'm running prepdocs.ps1 which runs prepdocs.py.
  3. From what I've gathered, all permissions are granted do the doc intelligence service identity including Owner. (see image) 2024-07-18_18-12-57

Any log messages given by the failure

Running "prepdocs.py" ./scripts/prepdocs.py "C:\Projects\AzureSearchOpenAI/data/" --verbose --subscriptionid dc9a5b0e-3f65-466b-9dc9-16258235d9e7 --storageaccount stnejioursfhmeq --container content --storageresourcegroup rg_azureaisearch --searchservice -aisearch --index gptkbindex --searchsecretname searchServiceSecret --openaihost "azure" --openaimodelname "text-embedding-ada-002" --openaiservice "-ai" --openaideployment "embedding" --openaikey "" --openaiorg "" --documentintelligenceservice -docintelligence --tenantid e7b81d0a-a949-4103-83dc-feff6277c109 --useacls --keyvaultname kv-nejioursfhmeq C:\Projects\AzureSearchOpenAI\scripts\prepdocs.py:253: SyntaxWarning: invalid escape sequence '\d' epilog="Example: prepdocs.py '..\data*' --storageaccount myaccount --container mycontainer --searchservice mysearch --index myindex -v",
Processing files... Using local files in C:\Projects\AzureSearchOpenAI/data/
Ensuring search index gptkbindex exists Search index gptkbindex already exists Parsing 'Benefit_Options.pdf' Extracting text from 'C:\Projects\AzureSearchOpenAI/data\Benefit_Options.pdf' using Azure Document Intelligence Traceback (most recent call last): File "C:\Projects\AzureSearchOpenAI\scripts\prepdocs.py", line 429, in loop.run_until_complete(main(ingestion_strategy, azd_credential, args)) File "C:\Python\Lib\asyncio\base_events.py", line 687, in run_until_complete return future.result() ^^^^^^^^^^^^^^^ File "C:\Projects\AzureSearchOpenAI\scripts\prepdocs.py", line 247, in main await strategy.run(search_info) File "C:\Projects\AzureSearchOpenAI\scripts\prepdocslib\filestrategy.py", line 64, in run pages = [page async for page in processor.parser.parse(content=file.content)] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\Projects\AzureSearchOpenAI\scripts\prepdocslib\pdfparser.py", line 58, in parse poller = await document_intelligence_client.begin_analyze_document( ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\Projects\AzureSearchOpenAI\scripts.venv\Lib\site-packages\azure\core\tracing\decorator_async.py", line 77, in wrapper_use_tracer return await func(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\Projects\AzureSearchOpenAI\scripts.venv\Lib\site-packages\azure\ai\documentintelligence\aio_operations_operations.py", line 372, in begin_analyze_document raw_result = await self._analyze_document_initial( # type: ignore ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\Projects\AzureSearchOpenAI\scripts.venv\Lib\site-packages\azure\ai\documentintelligence\aio_operations_operations.py", line 130, in _analyze_document_initial map_error(status_code=response.status_code, response=response, error_map=error_map) File "C:\Projects\AzureSearchOpenAI\scripts.venv\Lib\site-packages\azure\core\exceptions.py", line 164, in map_error raise error azure.core.exceptions.ClientAuthenticationError: (PermissionDenied) Principal does not have access to API/Operation. Code: PermissionDenied Message: Principal does not have access to API/Operation.

Expected/desired behavior

OS and Version?

Windows 7, 8 or 10. Linux (which distribution). macOS (Yosemite? El Capitan? Sierra?)

azd version?

run azd version and copy paste here.

Versions

Mention any other details that might be useful


Thanks! We'll be in touch soon.

drajinvites82 commented 4 months ago

I am facing the same issue while running the app locally in a shared environment using the command - ./start.ps1.

Below is the error message for reference.


Starting backend

pamelafox commented 4 months ago

Your user seems to be missing Cognitive Services User: https://learn.microsoft.com/en-us/azure/role-based-access-control/built-in-roles/ai-machine-learning#cognitive-services-user

That should have been assigned during the azd up process by this Bicep:

// For both document intelligence and computer vision
module cognitiveServicesRoleUser 'core/security/role.bicep' = {
  scope: resourceGroup
  name: 'cognitiveservices-role-user'
  params: {
    principalId: principalId
    roleDefinitionId: 'a97b65f3-24c7-4388-baec-2e87135dc908'
    principalType: principalType
  }
}
pamelafox commented 4 months ago

@drajinvites82 Your issue is different, since it's for the OpenAI user. I notice your logs say "INFO:azure.identity.aio._credentials.chained:DefaultAzureCredential acquired a token from ManagedIdentityCredential" Locally, I would expect it to say "INFO:azure.identity.aio._credentials.chained:DefaultAzureCredential acquired a token from AzureDeveloperCliCredential" So for some reason it's fetching the identity from a managed identity instead of the azd credential, perhaps due to some environment variables.

You could fix those env variables, or change this line in app.py to explicitly use AzureDeveloperCliCredential:

azure_credential = DefaultAzureCredential(exclude_shared_token_cache_credential=True)
thracian2015 commented 4 months ago

@pamelafox, unfortunately the same error after assigning Cognitive Services User to the Document Intelligence Service->Resource Management -> Identity. I assume from previous replies that it needs to be added to the Identity section and not to Access Control (IAM) which has me as Owner anyway. What else could be wrong? I'm using the Free SKU.

Parsing 'Benefit_Options.pdf' Extracting text from 'C:\Projects\AzureSearchOpenAI/data\Benefit_Options.pdf' using Azure Document Intelligence Traceback (most recent call last): File "C:\Projects\AzureSearchOpenAI\scripts\prepdocs.py", line 429, in loop.run_until_complete(main(ingestion_strategy, azd_credential, args)) File "C:\Python\Lib\asyncio\base_events.py", line 687, in run_until_complete return future.result() ^^^^^^^^^^^^^^^ File "C:\Projects\AzureSearchOpenAI\scripts\prepdocs.py", line 247, in main await strategy.run(search_info) File "C:\Projects\AzureSearchOpenAI\scripts\prepdocslib\filestrategy.py", line 64, in run pages = [page async for page in processor.parser.parse(content=file.content)] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\Projects\AzureSearchOpenAI\scripts\prepdocslib\pdfparser.py", line 58, in parse poller = await document_intelligence_client.begin_analyze_document( ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\Projects\AzureSearchOpenAI\scripts.venv\Lib\site-packages\azure\core\tracing\decorator_async.py", line 77, in wrapper_use_tracer
return await func(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\Projects\AzureSearchOpenAI\scripts.venv\Lib\site-packages\azure\ai\documentintelligence\aio_operations_operations.py", line 372, in begin_analyze_document raw_result = await self._analyze_document_initial( # type: ignore ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\Projects\AzureSearchOpenAI\scripts.venv\Lib\site-packages\azure\ai\documentintelligence\aio_operations_operations.py", line 130, in _analyze_document_initial map_error(status_code=response.status_code, response=response, error_map=error_map) File "C:\Projects\AzureSearchOpenAI\scripts.venv\Lib\site-packages\azure\core\exceptions.py", line 164, in map_error raise error azure.core.exceptions.ClientAuthenticationError: (PermissionDenied) Principal does not have access to API/Operation. Code: PermissionDenied Message: Principal does not have access to API/Operation.

cdhinrichs commented 3 months ago

This problem seems to be the type that has several possible causes that all lead to the same error mode. I had this problem, but my solution was unlike the others above. What I found helpful was when I (finally) tracked down the actual behavior of DefaultAzureCredential, which was unexpected to me. This behavior is rather well documented, but only in the comments of the class definition itself:

class DefaultAzureCredential(ChainedTokenCredential):
    """A default credential capable of handling most Azure SDK authentication scenarios.
    The identity it uses depends on the environment. When an access token is needed, it requests one using these
    identities in turn, stopping when one provides a token:

    1. A service principal configured by environment variables. See :class:`~azure.identity.aio.EnvironmentCredential`
       for more details.
    2. WorkloadIdentityCredential if environment variable configuration is set by the Azure workload
       identity webhook.
    3. An Azure managed identity. See :class:`~azure.identity.aio.ManagedIdentityCredential` for more details.
    4. On Windows only: a user who has signed in with a Microsoft application, such as Visual Studio. If multiple
       identities are in the cache, then the value of  the environment variable ``AZURE_USERNAME`` is used to select
       which identity to use. See :class:`~azure.identity.aio.SharedTokenCacheCredential` for more details.
    5. The identity currently logged in to the Azure CLI.
    6. The identity currently logged in to Azure PowerShell.
    7. The identity currently logged in to the Azure Developer CLI.
    """

In the case of this repo, azure-search-openai-demo, I have empirically determined that the expected path is #7 -- relying on the cached azd auth login credentials. Specifically, when you do azd up, the provisioning process relies on your Azure user account being the "Principal", who has permission / IAM role[s] to access all of the needed resources, such as the openai_client and so on.

I know this now, because my problem was that I was getting stuck in #1, because I had AZURE_CLIENT_ID set as an environment variable, (because I'm working with several Azure projects, and if you're reading this you may as well). The first thing I tried was to set AZURE_CLIENT_ID, (and AZURE_CLIENT_SECRET and AZURE_CLIENT_SECRET_ID!!!!) to be the correct App ID for the Server App that I registered when following the setup instructions for this project, which surprised me by not solving the problem. Looking into it I found that the App itself was being treated as the "Principal", and the App itself is not set up to have those permissions. Only my user account is set up that way.

Again, if you're reading this, your problem may be different, because if you get anything other that #7, you are liable to get this error message, (unless you've taken further steps to grant IAM roles for whatever "Principal" you end up with.)

This code snippet was greatly helpful in diagnosing the problem:

import jwt
from azure.identity.aio import DefaultAzureCredential

async def foo():
    cred =  DefaultAzureCredential(exclude_shared_token_cache_credential=True)
    token = await cred.get_token("https://cognitiveservices.azure.com/.default")
    return jwt.decode(token.token, options={"verify_signature": False})

from pprint import pprint
from asyncio import run
pprint(run(foo))

If you went through #7 above, the sub (as in Subject, which is a different way of referring to the "Principal",) field should NOT be a UUID, but a different format of random strings (10 random characters, a '_', then 32 random characters). You should also see the name and email from your user account. The appid field is populated with a UUID that I don't recognize - maybe Azure sets each user "Principal" up with a pseudo app id.

I hope this helps. It took me about a day to work through this problem.

scottaddie commented 2 weeks ago

What I found helpful was when I (finally) tracked down the actual behavior of DefaultAzureCredential, which was unexpected to me.

@cdhinrichs We recently documented credential chains, including DefaultAzureCredential at https://aka.ms/azsdk/python/identity/credential-chains. I'm curious if the guidance in that doc would've been helpful to you.