Azure / enterprise-azureai

Unleash the power of Azure AI to your application developers in a secure & manageable way with Azure API Management and Azure Developer CLI.
MIT License
71 stars 32 forks source link

Deployment challenges in PTU scenarios #51

Open punitkshah opened 5 months ago

punitkshah commented 5 months ago

In situations where a customer has acquired throughput units for their Azure OpenAI instance, the deployed models are linked to the provisioned throughput units (PTUs). In such instances, the script should offer the choice to utilize the already-existing Azure OpenAI endpoints, along with the model's name, rather than attempting to generate these resources.

This can be accomplished by parameterizing the values of the endpoints to be used and employing a conditional flag for the deployment of Azure OpenAI resources and models.

iMicknl commented 5 months ago

Thanks for your suggestion, @punitkshah! For my understanding, would a customer with provisioned deployments (PTU) not deploy this via infra as code already? My belief would be that a customer would adapt this repository to their needs, and add their current deployment in the infra code, instead of deploying both separately.

When they have added the PTU deployment in the infra as code of this repository; if the deployment is already present, it won't recreate the OpenAI model deployment.

punitkshah commented 5 months ago

@iMicknl - I was thinking of an approach that involves parameterizing the endpoint and model name. However, for the incorporation of existing resources, users would be required to modify the repository by updating specific details in the main.bicep and ai.bicep files, such as the model name and capacity.

In the case of most PTU deployments, which I anticipate will be common in scenarios where this proxy is utilized, the models are expected to have already been deployed with the required capacity.

While it is not a significant obstacle, based on recent deployment, it appears that users find it more convenient to specify values in just one parameter file rather than updating multiple bicep files.