Issue with Open AI Provider Microsoft.CognitiveServices/accounts/deployments

JFolberth commented 1 year ago

Bicep version 0.18.4

Describe the bug I am fairly confident this is an issue with the native provider. I am attempting to deploy a Cognitive Services account w/ gpt-35-turbo deployment.

The portal shows the following:

       {
            "type": "Microsoft.CognitiveServices/accounts/deployments",
            "apiVersion": "2022-12-01",
            "name": "[concat(parameters('accounts_pdfgptdemo_dev_eus_name'), '/gpt-35-turbo')]",
            "dependsOn": [
                "[resourceId('Microsoft.CognitiveServices/accounts', parameters('accounts_pdfgptdemo_dev_eus_name'))]"
            ],
            "properties": {
                "model": {
                    "format": "OpenAI",
                    "name": "gpt-35-turbo",
                    "version": "0301"
                },
                "scaleSettings": {
                    "scaleType": "Standard",
                    "capacity": 120
                },
                "raiPolicyName": "Microsoft.Default"
            }
        },

My bicep first attempt was:

resource gpt4Deployment 'Microsoft.CognitiveServices/accounts/deployments@2022-12-01'={
  parent: openAI
  name: 'gpt-35-turbo'
  properties: {
    model: {
      format: 'OpenAI'
      name: 'gpt-35-turbo'
      version: '0301'

    }
   scaleSettings: {
    scaleType: 'Standard'
    capacity: 120

   }
    raiPolicyName:'Microsoft.Default'
  }
}

I got the See inner errors for details.\",\r\n \"details\": [\r\n {\r\n \"code\": \"InvalidCapacity\",\r\n \"message\": \"The capacity should be null for standard deployment.

I switched my to capacity: null and now get "The specified capacity '120' of account deployment is bigger than available capacity '119' for UsageName 'Tokens Per Minute (thousands) - GPT-35-Turbo'

To Reproduce I have my IaC here: https://github.com/JFolberth/PDFgpt/tree/gpt3 Python code isn't 100% compatible with GPT 35 though IaC will reflect this issue. When running the deployment will fail and still create the Open AI Model

Additional context Add any other context about the problem here.

alex-frankel commented 1 year ago

Yes, please open a support ticket for this one.

JFolberth commented 1 year ago

Case opened with support: 2306200040000242. Any help getting it to the appropriate group would be appreciated.

johnnyreilly commented 12 months ago

We're also bumping on this - is there any news?

JFolberth commented 11 months ago

To update there is definitely something off in 2023-05-01 resource provider. We got it to work when downgrading to 2022-12-01 and messing with capacity. 2023-05-01 we were unable to get the deployment to work with.

For starters the documentation is out of data as the setting scaleSetting is replaced with sku in the newer API version. It did work also when I upgraded my capacity to 240, then downgraded to 120. Ticket is still being moved along the proper channels.

johnnyreilly commented 11 months ago

Just to back up what @JFolberth says, we tried setting capacity via Bicep and got:

[error]InvalidResourceProperties: The scale settings of account deployment is deprecated since API version '2023-05-01', please use 'sku' instead.

johnrsanders commented 11 months ago

@JFolberth we updated our docs and bicep yesterday; please try the latest today and lmk if this doesn't resolve the issue. Thank you for reporting it.

API DOCUMENTATION:

Please note the sku.capacity property, which is used to set quota.

REST: https://learn.microsoft.com/en-us/rest/api/cognitiveservices/accountmanagement/deployments/create-or-update?tabs=HTTP

ARM/Terraform/Bicep: https://learn.microsoft.com/en-us/azure/templates/microsoft.cognitiveservices/accounts?pivots=deployment-language-bicep

JFolberth commented 11 months ago

@johnrsanders unfortunately deployment still does not work with the new API: "code": "InvalidRequestContent", "message": "The request content was invalid and could not be deserialized."

Can reach out on Teams for more detail or I can give you access to my resource. Scenario is an existing Open AI instance that has a deployment being submitted by Bicep where there are no changes occurring. Have the IaC updated: https://github.com/JFolberth/PDFgpt/blob/main/infrastructure/modules/openAI.module.bicep

JFolberth commented 11 months ago

Digging further in this with the provided documentation and I think this is going to cause a breaking change not to mention a lot of confusion if what I read it is accurate and I understood.....the property 'name' on older APIs actually maps to 'tier' and 'name' is still required though per the documentation that value will be a combination of a number and a letter now as opposed to a value like 'Standard'. I did attempt to create a deployment manually and export it and still did not get the updated 'tier' values

As an experiment I updated my sku object to: sku: { name: 'S1' tier: 'Standard' capacity: 120 }

First attempted to use the sku name of the cognitive services account and then took an educated guess on the sku name since I was unable to locate anyway to retrieve this information as mentioned an export from the portal did not show this property mapped nor value. Now I get an error of "The specified capacity '1' of account deployment is bigger than available capacity '0' for UsageName 'Tokens Per Minute (thousands) - GPT-35-Turbo'.\"

johnnyreilly commented 11 months ago

I've successfully used sku to deploy resources sized by capacity using Bicep. I'll see if I can dig out an example today

johnnyreilly commented 11 months ago

Consider the following account-deployments.bicep:

@description('Name of the Cognitive Services resource')
param cognitiveServicesName string

@description('Name of the deployment resource.')
param deploymentName string

@description('Deployment model format.')
param format string

@description('Deployment model name.')
param name string

@description('Deployment model version.')
param version string = '1'

@description('The name of RAI policy.')
param raiPolicyName string = 'Default'

@allowed([
  'NoAutoUpgrade'
  'OnceCurrentVersionExpired'
  'OnceNewDefaultVersionAvailable'
])
@description('Deployment model version upgrade option. see https://learn.microsoft.com/en-us/azure/templates/microsoft.cognitiveservices/2023-05-01/accounts/deployments?pivots=deployment-language-bicep#deploymentproperties')
param versionUpgradeOption string = 'OnceNewDefaultVersionAvailable'

@description('''Deployments SKU see: https://learn.microsoft.com/en-us/azure/templates/microsoft.cognitiveservices/2023-05-01/accounts/deployments?pivots=deployment-language-bicep#sku
eg:

sku: {
  name: 'Standard'
  capacity: 10
}

''')
param sku object

// https://learn.microsoft.com/en-us/azure/templates/microsoft.cognitiveservices/2023-05-01/accounts?pivots=deployment-language-bicep
resource cog 'Microsoft.CognitiveServices/accounts@2023-05-01' existing = {
  name: cognitiveServicesName
}

// https://learn.microsoft.com/en-us/azure/templates/microsoft.cognitiveservices/2023-05-01/accounts/deployments?pivots=deployment-language-bicep
resource deployment 'Microsoft.CognitiveServices/accounts/deployments@2023-05-01' = {
  name: deploymentName
  parent: cog
  sku: sku
  properties: {
    model: {
      format: format
      name: name
      version: version
    }
    raiPolicyName: raiPolicyName
    versionUpgradeOption: versionUpgradeOption
  }
}

output deploymentName string = deployment.name
output deploymentResourceId string = deployment.id

We use this to deploy like so:

var cognitiveServicesDeployments = [
  // Commented models are deprecated and blocked for new deployments
  // https://github.com/Azure-Samples/azure-search-openai-demo/issues/388
  // https://techcommunity.microsoft.com/t5/ai-cognitive-services-blog/announcing-updates-to-azure-openai-service-models/ba-p/3866757
  // https://github.com/Azure-Samples/azure-search-openai-demo/pull/389
  // {
  //   name: 'OpenAi-text-davinci-003'
  //   shortName: 'davinci'
  //   model: {
  //     format: 'OpenAI'
  //     name: 'text-davinci-003'
  //     version: '1'
  //   }
  //   sku: {
  //     name: 'Standard'
  //     capacity: 10
  //   }
  // }
  {
    name: 'OpenAi-gpt-35-turbo'
    shortName: 'gpt35t'
    model: {
      format: 'OpenAI'
      name: 'gpt-35-turbo'
      version: '0301'
    }
    sku: {
      name: 'Standard'
      capacity: repositoryBranch == 'refs/heads/main' ? 100 : 10
    }
  }
]

// Model Deployment
@batchSize(1)
module openAiAccountsDeployments35Turbo 'account-deployments.bicep' = [for deployment in cognitiveServicesDeployments: {
  name: '${deploymentPrefix}-${deployment.shortName}-cog-accounts-deployments'
  params: {
    cognitiveServicesName: openAi.outputs.cognitiveServicesName
    deploymentName: deployment.name
    format: deployment.model.format
    name: deployment.model.name
    version: deployment.model.version
    sku: deployment.sku
  }
}]

We're currently only deploying a single account deployment in our array, but notice the sku portion:

    sku: {
      name: 'Standard'
      capacity: repositoryBranch == 'refs/heads/main' ? 100 : 10
    }

Here we provision a larger capacity for our feature branch deployments than our main branch deployments. Hopefully this helps @JFolberth

JFolberth commented 11 months ago

@johnnyreilly Thanks for that! Unfortunately I am still getting a quota error which makes me think the way quota is being evaluated is incorrect as the Bicep/ARM deployment shouldn't be impacting quota. The specified capacity '1' of account deployment is bigger than available capacity '0' for UsageName 'Tokens Per Minute (thousands) -

I do think the documentation on the deployment endpoint then is not correct as it does not line up with the resource itself:

johnnyreilly commented 11 months ago

Interesting - I wonder why that works for us but not for you. We deploy to westeurope - not sure if that's relevant

johnrsanders commented 11 months ago

@jfolberth - reached out on teams to get your subid info and see if it's an account perms issue, or an API issue.

JFolberth commented 10 months ago

To update the thread it was been confirmed that if I have a 100 quota for the subscription with an existing deployment of 100 and redeploy that Open AI deployment, even without any updates, the provider will fail the deployment. It is taking the existing quota of 100 and adding it to the proposed deployment which contained 100, thus checking if 200 is available even if it is an update to an existing resource.

chandlerkent commented 10 months ago

We are also getting the same error as @JFolberth when attempting to do a deploy to an existing resource with no changes to the resource:

InsufficientQuota - The specified capacity '1' of account deployment is bigger than available capacity '0' for UsageName 'Tokens Per Minute (thousands) - Text-Embedding-Ada-002'.

pjirsa commented 10 months ago

@chandlerkent This sounds like a permissions issue. (Disclaimer - I am not an official support resource). However, I'm aware of a couple issues that could be causing this behavior.

You must be an owner of the Azure OpenAI resource in order to create deployments. Even though the documentation says "Contributor" is sufficient, this is a known bug.
You must have "Cognitive Services User" and "Cognitive Services Usages Reader" roles assigned in order to see the available quotas at the subscription level.

johnnyreilly commented 10 months ago

I wrote up my approach for deploying with a specific capacity here:

https://johnnyreilly.com/azure-open-ai-capacity-quota-bicep

Unrelated, there are other permissions issues with cognitive services, you cannot purge a deleted resource unless you're a contributor at subscription level

JFolberth commented 10 months ago

@johnrsanders is there any update on this?

chandlerkent commented 9 months ago

@chandlerkent This sounds like a permissions issue. (Disclaimer - I am not an official support resource). However, I'm aware of a couple issues that could be causing this behavior.

You must be an owner of the Azure OpenAI resource in order to create deployments. Even though the documentation says "Contributor" is sufficient, this is a known bug.

You must have "Cognitive Services User" and "Cognitive Services Usages Reader" roles assigned in order to see the available quotas at the subscription level.

@pjirsa this is not our particular issue. The issue stems from the fact that for some reason when you are doing a deploy it is checking if there is available quota even if you are not adding or increasing the amount of quota being used. Here's an example:

I have deployed a gpt-35-turbo model which is using all 240 of the available quota. The first deploy runs without issue. Now I run the same template without any changes again and it fails with the InsufficientQuota error. This should not happen since there's no additional quota being used.

A fix for us that worked until this week was to make sure there was always at least 1 quota available. So in the previous example if I had a deployment that used 239 out of the 240 available quota, then all subsequent releases would work fine since it saw there was at least 1 quota available.

However, this is now broken again for us this week because it seems this check was changed to now not look for just 1 available quota but to check for the amount of available quota for whatever your deployment is even if your deployment would not result in any new quota being used. So now we are stuck since we can no longer make any changes to our deployments through Bicep due to this error. We now must update our deployments manually until this issue is resolved.

leanderkunstmann commented 9 months ago

@

@chandlerkent This sounds like a permissions issue. (Disclaimer - I am not an official support resource). However, I'm aware of a couple issues that could be causing this behavior.

You must be an owner of the Azure OpenAI resource in order to create deployments. Even though the documentation says "Contributor" is sufficient, this is a known bug.

You must have "Cognitive Services User" and "Cognitive Services Usages Reader" roles assigned in order to see the available quotas at the subscription level.

@pjirsa this is not our particular issue. The issue stems from the fact that for some reason when you are doing a deploy it is checking if there is available quota even if you are not adding or increasing the amount of quota being used. Here's an example:

I have deployed a gpt-35-turbo model which is using all 240 of the available quota. The first deploy runs without issue. Now I run the same template without any changes again and it fails with the InsufficientQuota error. This should not happen since there's no additional quota being used.

A fix for us that worked until this week was to make sure there was always at least 1 quota available. So in the previous example if I had a deployment that used 239 out of the 240 available quota, then all subsequent releases would work fine since it saw there was at least 1 quota available.

However, this is now broken again for us this week because it seems this check was changed to now not look for just 1 available quota but to check for the amount of available quota for whatever your deployment is even if your deployment would not result in any new quota being used. So now we are stuck since we can no longer make any changes to our deployments through Bicep due to this error. We now must update our deployments manually until this issue is resolved.

@chandlerkent thanks for describing and clarifying the issue, we have the exact same problem, also solved it by decreasing the quota to have always 1 left, now this does not work anymore, this would now only work if we use below 50% of our Quota. I hope the MS colleagues will have a look at this soon, this is really an annoying problem, as we have to delete our resources every time before deployment.

Digma commented 9 months ago

Facing the same issue when wanting to use more than 50% of the quota and redeploying. We added a condition in bicep not to redeploy the models apart during the initial setup. However, given the declarative nature of bicep, we should not need to rely on that.

jongio commented 9 months ago

I ran into this error and got around it by deleting and purging all OpenAI instances and deployments in my sub.

JFolberth commented 9 months ago

This could be a workaround in some scenarios. In mine I am attempting to configure an Open AI architecture into an Azure Deployment Environment for on demand provisioning developer apps so I wouldn't be able to automate the deleting and purging since it's on demand.

tatroc commented 9 months ago

For me the issue was... I had deleted a Azure AI instance, by default when you delete instance it is a soft delete. It appears what is happening the capacity of 120 of the deleted instance counted against my total 240 quota. Once I followed the directions here and purged the soft deleted Azure AI instance I was able to perform the deployment of the 2nd Azure AI instance.

jenka13all commented 4 months ago

This still seems to be an issue: I have the exact same problems described here with v 2023-05-01, 2022-12-01 and 2022-10-01. Documentation has not been updated for v 2023-05-01 to indicate to remove the "scaleSettings" block, and the descriptions of proper values for the "sku" block are still incorrect.

I am unable to deploy a gpt-4-turbo model with ARM: using the 2022-12-01 version, and scaleSettings block, I am told to use "null" for "capacity" and then, after that change that neither "Standard" nor "Manual" is supported by the model. Those are, according to the documentation, the only two choices...

Was someone else able to get their deployment through? I am not even getting to the part where it tells me that I don't have capacity, as other commenters have reached that stage due to another obvious bug in the code.

microsoft-github-policy-service[bot] commented 4 months ago

Thanks for the feedback! We are routing this to the appropriate team for follow-up. cc @yangyuan. Please see https://aka.ms/biceptypesinfo for troubleshooting help.

roldengarm commented 2 weeks ago

Also having the same issue, re-running the same deployment fails the second time.

Azure / bicep-types-az

Issue with Open AI Provider Microsoft.CognitiveServices/accounts/deployments #1660

[error]InvalidResourceProperties: The scale settings of account deployment is deprecated since API version '2023-05-01', please use 'sku' instead.