Closed nathaniel-msft closed 5 months ago
Thanks for raising this @nathaniel-msft.
This seems to be subscription related and not due to any bug in the infra-as-code templates. Users should ensure they have enough quota in the desired region before running the azd up
command.
This Azure CLI command can be used to check tokens per minute quota for a particular region:
az cognitiveservices usage list \
--location eastus2 \
--query "[].{name: name.value, currentValue:currentValue, limit: limit}" \
-o table
Currently we are defaulting the model capacity to 30 TPM (see variables.tf) so the difference between CurrentValue and Limit in the CLI query above should be equal or greater than that.
Can you confirm the usage on your end?
Sounds good, I'll close this issue. There were concerns about setting a default to gpt-35-turbo since it's not available on all subscriptions, but there's no better alternative currently as the -16k is an upgraded version.
Describe the bug
Not all models on Open AI are available for usage/consumption. As a result, some subscriptions aren't able to use the
gpt-35-turbo
model and it results in a hung cluster. Instead of a cluster stuck in a hung status, have it still run, just without the OpenAI Service or try a different model.To Reproduce Steps to reproduce the behavior:
azd up
gpt-35-turbo
This operation requires 30 new capacity in quota Tokens Per Minute (thousands) - GPT-35-Turbo, which is bigger than the current available capacity 0. The current quota usage is 300 and the quota limit is 300 for quota Tokens Per Minute (thousands) - GPT-35-Turbo. (Code: InsufficientQuota)
Expected behavior A clear and concise description of what you expected to happen.
Desktop (please complete the following information):
Additional context Add any other context about the problem here.