Azure-Samples / azure-openai-rag-workshop

Create your own ChatGPT with Retrieval-Augmented-Generation workshop
https://aka.ms/ws/openai-rag
MIT License
97 stars 349 forks source link

Azure Trial does not support this workshop out-of-the-box #40

Open brucedkyle opened 3 days ago

brucedkyle commented 3 days ago

In "docs/sections/09-azure.md", the text suggests that a trial or free version of Azure is sufficient to deploy this workshop. However, it does not seem to deploy correctly. When I got to the point later in the workshop azd provision fails:

ERROR: deployment failed: error deploying infrastructure: deploying to subscription:

Deployment Error Details:
InvalidTemplateDeployment: The template deployment 'openai' is not valid according to the validation procedure. The tracking id is '3314ca9d-56c3-4e34-ba0a-209def153858'. See inner errors for details.
InsufficientQuota: This operation require 15 new capacity in quota Tokens Per Minute (thousands) - gpt-4o-mini - GlobalStandard, which is bigger than the current available capacity 1. The current quota usage is 0 and the quota limit is 1 for quota Tokens Per Minute (thousands) - gpt-4o-mini - GlobalStandard.

The support tool does not allow changes to quota. When you follow the support instructions to request an update, the link takes you to a form. The form to request change in quota is "closed". I tried it for US East 2. Not sure of other locations/configurations, etc.

Neither the support text nor CoPilot in that location have suggestions on how to solve these issues.

sinedied commented 3 days ago

Thanks for reporting this! We recently changed the model from gpt-4 to gpt-4o-mini, the restriction probably apply only to the new model.

I'll ask around to understand which models are restricted with Trial subscription and update the docs. Meanwhile, you can try a different model, even an "older" gpt-35-turbo works well for this workshop.

EDIT: I just found out that the 1TPM restriction applies to all models when using free trial, that means that the best way to fix the issue for now it to update the capacity to 1 here: https://github.com/Azure-Samples/azure-openai-rag-workshop/blob/main/infra/main.bicep#L51

I'll check if we can have a better workaround and update the docs.

brucedkyle commented 3 days ago

Many thanks. I was reading through all the docs last night down one rabbit hole after another trying to figure out how to set that value and how to reset my Bicep. This is valuable to know when it is time for production too.

So your suggestion is PERFECT. To do the dev/test in the lab, this should work perfectly.

I'll try it in a few hours.

My thanks.

brucedkyle commented 3 days ago

I may be getting ahead of myself, but should I be using the "ProvisionedManaged" SKU to also help manage my quota?

a. Not sure if that fixes the problem for the trial. b. Should that be my choice even when I go into the paid subscriptions?

If that's discussed in the lab, I'll be happy to learn more.

sinedied commented 3 days ago

I don't recommend using the ProvisionedManaged SKU (either for paid or trial) unless you have a consistent high throughput workload, as you get billed by the hour instead or per API usage with Standard/GlobalStandard SKU.

The best workaround would be to use GitHub Models if you have access, as you get 8K TPM and a simpler setup (you can use a similar setup as Ollama, if you're using the Qdrant variant of the workshop). We plan to add these as an alternative in the docs, this should be up in the coming weeks.

brucedkyle commented 3 days ago

I will check out GitHub Models and the Qdrant variant of the workshop. Both great suggestions.

I just have to say I so appreciate your support. And I hope my suggestions are helpful.

This is one of the finest labs I've worked on. Great details and excellent starting code from what I can tell.

This topic is huge. And I am being asked about RAG in every AI/ML interview. Being up on Azure is HUGE to me. And this is a great starting point.

brucedkyle commented 2 days ago

Thank you for your help. I am up with the 1TPM.