Azure Open AI deployment error. Unable to re-run the bicep script because of quota errors.

anhashia commented 1 year ago

Bicep version Bicep CLI version 0.21.1 (d4acbd2a9f)

Describe the bug If your rerun the provided sample script, it throws quota error even though there is no change in deployment script.

sku: {  
      name: 'Standard' // SKU name  
      capacity: 155 // SKU capacity . This set TPM rate of deployment. 
    }

Error

Inner Errors:
{"code": "InsufficientQuota", "message": "The specified capacity '155' of account deployment is bigger than available capacity '10' for UsageName 'Tokens Per Minute (thousands) - GPT-35-Turbo'."}

To Reproduce Below is sample script called deploy.bicep that you can use for local reproduction. Replace it with your AOAI instance name and region

// Define an existing Cognitive Services resource using the 2023-05-01 API version  
// https://learn.microsoft.com/en-us/azure/templates/microsoft.cognitiveservices/2023-05-01/accounts?pivots=deployment-language-bicep  

// Declare the location parameter with a default value of 'canadaeast'  
// This parameter specifies the location where the Azure OpenAI Cognitive Services resource will be created  
@description('Specifies the location for resources.')  
param location string = 'northcentralus'  

// Declare the AOAIInstanceName parameter, which represents the name of the Azure OpenAI Cognitive Services resource  
// This parameter allows you to set a custom name for the resource  
@description('Name of Azure Open AI Cognitive Services resource')  
param AOAIInstanceName string='mynorthcentralus123' // example  

// Create the myAzureOpenAI resource, which represents the Azure OpenAI Cognitive Services account  
// This resource creation sets the name, location, kind, SKU, and custom subdomain name properties  
resource myAzureOpenAI 'Microsoft.CognitiveServices/accounts@2023-05-01'  = {  
  name: AOAIInstanceName // Set the resource name to the value of the AOAIInstanceName parameter  
  location: location // Set the resource location to the value of the location parameter  
  kind: 'OpenAI' // Set the resource kind to 'OpenAI'  
  sku: {  
    name: 'S0' // Set the SKU name to 'S0'  
    tier: 'Standard' // Set the SKU tier to 'Standard'  
  }  
  properties: {  
    customSubDomainName: AOAIInstanceName // Set the custom subdomain name to the value of the AOAIInstanceName parameter  
    publicNetworkAccess: 'Disabled'
    networkAcls: { // Add this block to define network access control lists  
      defaultAction: 'Allow'  
      virtualNetworkRules: []  
      ipRules: []  
    }  
  }  
}  

// Define an array of Cognitive Services deployments with details  
// This array contains a single deployment with model information, RAI policy name, version upgrade option, and SKU  
var myAzureOpenAIDeployment = [  
  {  
    displayName: 'mync1' // Display name for the deployment  
    model: {  
      format: 'OpenAI' // Model format  
      name: 'gpt-35-turbo' // Model name  
      version: '0613' // Model version  
    }  
    raiPolicyName: 'Microsoft.Default' // RAI policy name for the deployment  
    versionUpgradeOption: 'OnceNewDefaultVersionAvailable' // Version upgrade option for the deployment  
    sku: {  
      name: 'Standard' // SKU name  
      capacity: 155 // SKU capacity . This set TPM rate of deployment. 
    }  
  }  
]  

// Create a Model Deployment resource for each deployment in the myAzureOpenAIDeployment array  
// Use batchSize(1) decorator to limit parallel deployments as they are not supported  
@batchSize(1)  
resource aoaiModelDeployment 'Microsoft.CognitiveServices/accounts/deployments@2023-05-01'  = [for deployment in myAzureOpenAIDeployment: {  
  name: deployment.displayName // Set the deployment name to the display name from the array  
  parent:  myAzureOpenAI // Set the parent resource of the deployment to the myAzureOpenAI resource  
  properties: {  
    model: deployment.model // Set the model properties of the deployment from the array  
    raiPolicyName: deployment.raiPolicyName // Set the RAI policy name of the deployment from the array  
    versionUpgradeOption: deployment.versionUpgradeOption // Set the version upgrade option of the deployment from the array  
  }  
  sku: deployment.sku // Set the SKU properties of the deployment from the array  
}]  

// Output the capacity of the first deployment in the aoaiModelDeployment array  
// This output shows the capacity value of the model deployment created in the script  
output Capacity int = aoaiModelDeployment[0].sku.capacity

Run this sample script az deployment group create --resource-group NorthCentralUS --template-file ./deploy.bicep

Expected Result : Deployment should succeed without any errors

Actual Result :

{"code": "InvalidTemplateDeployment", "message": "The template deployment 'deploy' is not valid according to the validation procedure. The tracking id is '57aa6077-48aa-44a8-8160-cb530bd9477b'. See inner errors for details."}

Inner Errors:
{"code": "InsufficientQuota", "message": "The specified capacity '155' of account deployment is bigger than available capacity '10' for UsageName 'Tokens Per Minute (thousands) - GPT-35-Turbo'."}

Additional context sku: {
name: 'Standard' // SKU name
capacity: 155 // SKU capacity. This set TPM rate of deployment. }

Value in capacity is used to check for available quota without taking into account that it is an existing deployment and not a new capacity requirement.

stephaniezyen commented 1 year ago

This is a quota error being returned by Open AI and is not related to a Bicep language issue. Please open a support ticket with Azure Open AI for more traction.

anhashia commented 1 year ago

Sounds good ! Thanks for update! I will follow up with Azure Open AI team.

ishaan-jaff commented 11 months ago

@anhashia

i'm the maintainer of LiteLLM we allow you to maximize your throughput/increase rate limits - load balance between multiple deployments (Azure, OpenAI) I believe litellm can be helpful here - and i'd love your feedback if we're missing something

Here's how to use it Docs: https://docs.litellm.ai/docs/routing

from litellm import Router

model_list = [{ # list of model deployments 
    "model_name": "gpt-3.5-turbo", # model alias 
    "litellm_params": { # params for litellm completion/embedding call 
        "model": "azure/chatgpt-v-2", # actual model name
        "api_key": os.getenv("AZURE_API_KEY"),
        "api_version": os.getenv("AZURE_API_VERSION"),
        "api_base": os.getenv("AZURE_API_BASE")
    }
}, {
    "model_name": "gpt-3.5-turbo", 
    "litellm_params": { # params for litellm completion/embedding call 
        "model": "azure/chatgpt-functioncalling", 
        "api_key": os.getenv("AZURE_API_KEY"),
        "api_version": os.getenv("AZURE_API_VERSION"),
        "api_base": os.getenv("AZURE_API_BASE")
    }
}, {
    "model_name": "gpt-3.5-turbo", 
    "litellm_params": { # params for litellm completion/embedding call 
        "model": "vllm/TheBloke/Marcoroni-70B-v1-AWQ", 
        "api_key": os.getenv("OPENAI_API_KEY"),
    }
}]

router = Router(model_list=model_list)

# openai.ChatCompletion.create replacement
response = router.completion(model="gpt-3.5-turbo", 
                messages=[{"role": "user", "content": "Hey, how's it going?"}])

print(response)

roldengarm commented 5 months ago

Sounds good ! Thanks for update! I will follow up with Azure Open AI team.

@anhashia did you ever get an update for this? We've got the same problem, rerunning the same deployment fails.

Azure / bicep-types-az

Azure Open AI deployment error. Unable to re-run the bicep script because of quota errors. #1844