Closed rliberoff closed 1 week ago
Ok I just ran this locally and didnt have any issues. Let's confirm a couple things @rliberoff
e.g.
git fetch --tags
git checkout tags/v0.3.0
The right kaito image for phi-3 support is mcr.microsoft.com/aks/kaito/workspace:0.3.0
. Is this image being used? You can check this by kubectl describe the kaito-workspace pod (should be in the kaito-workspace namespace).
Should show
Successfully pulled image "mcr.microsoft.com/aks/kaito/workspace:0.3.0"
Can you paste the workspace controller log? It should tell whether the webhook server has problem or not.
Hi @ishaansehgal99 and @Fei-Guo ,
Thank your answer. Please allow me a few days to get this information. Thank you!
Hi @ishaansehgal99 and @Fei-Guo,
I was able to reproduce the error again using the AZ CLI version 2.62.0
with the aks-preview
extension enabled in PowerShell on Windows 11, and following the steps documented here → https://learn.microsoft.com/en-us/azure/aks/ai-toolchain-operator
BTW, I have the AIToolchainOperatorPreview
feature flag enabled in the subscription:
```json
{
"id": "/subscriptions/…/providers/Microsoft.Features/providers/Microsoft.ContainerService/features/AIToolchainOperatorPreview",
"name": "Microsoft.ContainerService/AIToolchainOperatorPreview",
"properties": {
"state": "Registered"
},
"type": "Microsoft.Features/providers/features"
}
```
The region I'm deploying the resources is France Central
.
Here are the steps I have just perform to reproduce the error:
$AZURE_SUBSCRIPTION_ID="c93…354"
$AZURE_RESOURCE_GROUP="rg-relv-test-kaito"
$AZURE_LOCATION="francecentral"
$CLUSTER_NAME="aks-relv-test-kaito"
az group create --name $AZURE_RESOURCE_GROUP --location $AZURE_LOCATION
az aks create --location $AZURE_LOCATION --resource-group $AZURE_RESOURCE_GROUP --name $CLUSTER_NAME --enable-oidc-issuer --enable-ai-toolchain-operator --generate-ssh-keys
kubectl
to connect to the new cluster:az aks get-credentials --resource-group $AZURE_RESOURCE_GROUP --name $CLUSTER_NAME
kubectl get nodes
Output:
NAME STATUS ROLES AGE VERSION
aks-nodepool1-20454229-vmss000000 Ready agent 3m26s v1.28.10
aks-nodepool1-20454229-vmss000001 Ready agent 3m29s v1.28.10
aks-nodepool1-20454229-vmss000002 Ready agent 3m6s v1.28.10
$MC_RESOURCE_GROUP=$(az aks show --resource-group $AZURE_RESOURCE_GROUP --name $CLUSTER_NAME --query nodeResourceGroup -o tsv)
$PRINCIPAL_ID=$(az identity show --name "ai-toolchain-operator-$CLUSTER_NAME" --resource-group "$MC_RESOURCE_GROUP" --query 'principalId' -o tsv)
$KAITO_IDENTITY_NAME="ai-toolchain-operator-$CLUSTER_NAME"
$AKS_OIDC_ISSUER=$(az aks show --resource-group "$AZURE_RESOURCE_GROUP" --name "$CLUSTER_NAME" --query "oidcIssuerProfile.issuerUrl" -o tsv)
az role assignment create --role "Contributor" --assignee "$PRINCIPAL_ID" --scope "/subscriptions/c93…354/resourcegroups/$AZURE_RESOURCE_GROUP"
az identity federated-credential create --name "kaito-federated-identity" --identity-name "$KAITO_IDENTITY_NAME" -g "$MC_RESOURCE_GROUP" --issuer "$AKS_OIDC_ISSUER" --subject system:serviceaccount:"kube-system:kaito-gpu-provisioner" --audience api://AzureADTokenExchange
kubectl rollout restart deployment/kaito-gpu-provisioner -n kube-system
Output:
deployment.apps/kaito-gpu-provisioner restarted
kubectl get deployment -n kube-system | grep kaito
Output:
kaito-gpu-provisioner 1/1 1 1 8m24s
kaito-workspace 1/1 1 1 8m24s
kubectl apply -f https://raw.githubusercontent.com/Azure/kaito/main/examples/inference/kaito_workspace_phi_3.yaml
The error is:
Error from server (InternalError): error when creating "https://raw.githubusercontent.com/Azure/kaito/main/examples/inference/kaito_workspace_phi_3.yaml": Internal error occurred: failed calling webhook "validation.workspace.kaito.sh": failed to call webhook: Post "https://workspace-webhook-svc.kube-system.svc:9443/validate/workspace.kaito.sh?timeout=10s": EOF
Looking into the AKS, and following the advice from @ishaansehgal99, I checked the kaito-workspace pod, and it seems that it is using the mcr.microsoft.com/aks/kaito/workspace:0.2.2
instead of the mcr.microsoft.com/aks/kaito/workspace:0.3.0
.
So it seems the issue is with AKS not being updated (perhaps in the francecentral
region) with the latest release of Kaito.
Do you know if there is a way to upgrade the workspace to the 0.3.0
version?
And if so, could you please guide me with the necessary steps? Sadly I'm a newbie in Kubernetes and AKS.
Your help and assistance is much appreciated.
Thank you!
So, no matter what I do... to force or something settting the AKS deployment to use mcr.microsoft.com/aks/kaito/workspace:0.3.0
. it eventually goes back to mcr.microsoft.com/aks/kaito/workspace:0.2.2
😢
Is there a way to tell AKS to use mcr.microsoft.com/aks/kaito/workspace:0.3.0
instead of mcr.microsoft.com/aks/kaito/workspace:0.2.2
?
Thank you.
@rliberoff, you are using AKS managed kaito addon. We have not released 0.3.0 in AKS addon yet. If you want to use phi3, please using the upstream chart installation guide and install an upstream version for now. https://github.com/Azure/kaito/blob/main/docs/installation.md
Hi @Fei-Guo,
I see. Ok, I need to see how to translate the instructions from those instructions into a Terraform script. At the end of the day, we are deploying the whole solution we're building with terraform.
Thank you.
Hi @Fei-Guo and @ishaansehgal99 ,
I was able to deploy Kaito version 0.3.0
using a Terraform based on the documentation you suggested (👉🏻 https://github.com/Azure/kaito/blob/main/docs/installation.md)
But, I'm not being able to create a Phi-3 medium
model.
My YAML is as follows:
apiVersion: kaito.sh/v1alpha1
kind: Workspace
metadata:
name: workspace-phi-3-medium
namespace: "kaito-rag"
annotations:
kaito.sh/enablelb: "False"
resource:
count: 1
instanceType: "Standard_NC12s_v3"
labelSelector:
matchLabels:
apps: phi-3
inference:
preset:
name: "phi-3-medium-4k-instruct"
What I'm doing wrong?
Is Phi-3 medium
supported?
I though it was based on the code from: https://github.com/Azure/kaito/blob/f259329a4e1cff3d1f5a7846c89733619e2e9d4a/presets/models/phi3/model.go#L34-L37
Thank you!
@rliberoff, what was the error? Is the inference deployment created or not? if it is created, please share the container log of the inference pod. If the deployment is not created, please check workspace status, especially, is there a GPU node (Standard_NC12s_v3) created successfully?
BTW, also the gpu-provider
deployed using helm
as described in the documentation is not starting. In AKS it looks like this:
Getting a describe
of the pod shows this information:
Name: gpu-provisioner-57d5c4959b-lkwkx
Namespace: gpu-provisioner
Priority: 2000000000
Priority Class Name: system-cluster-critical
Service Account: gpu-provisioner
Node: aks-user-38844619-vmss000000/10.241.0.4
Start Time: Thu, 25 Jul 2024 18:54:52 +0200
Labels: app.kubernetes.io/instance=gpu-provisioner
app.kubernetes.io/name=gpu-provisioner
azure.workload.identity/use=true
pod-template-hash=57d5c4959b
Annotations: checksum/settings: 50517b08c8328802043c3fdcb348c2a8847cc84c62f86242db7fe59824f0ba83
kubectl.kubernetes.io/restartedAt: 2024-07-25T18:54:51+02:00
Status: Running
IP: 172.0.3.167
IPs:
IP: 172.0.3.167
Controlled By: ReplicaSet/gpu-provisioner-57d5c4959b
Containers:
controller:
Container ID: containerd://d6d8cd513dae110dd7770e86638a5dba1cd36460169dd7e1f9d9776f8c1456f5
Image: mcr.microsoft.com/aks/kaito/gpu-provisioner:0.2.0
Image ID: mcr.microsoft.com/aks/kaito/gpu-provisioner@sha256:1204a7e948e9a5efbe14561e14ed6fb0bc5936aaf787e870bd6416da5b584874
Port: 8081/TCP
Host Port: 0/TCP
State: Waiting
Reason: CrashLoopBackOff
Last State: Terminated
Reason: Error
Exit Code: 2
Started: Thu, 25 Jul 2024 18:57:55 +0200
Finished: Thu, 25 Jul 2024 18:57:58 +0200
Ready: False
Restart Count: 5
Limits:
cpu: 500m
Requests:
cpu: 200m
Liveness: http-get http://:http/healthz delay=30s timeout=30s period=10s #success=1 #failure=3
Readiness: http-get http://:http/readyz delay=5s timeout=30s period=10s #success=1 #failure=3
Environment:
CONFIG_LOGGING_NAME: gpu-provisioner-config-logging
SYSTEM_NAMESPACE: gpu-provisioner (v1:metadata.namespace)
ARM_SUBSCRIPTION_ID: c93dfe1e-224e-4aad-a8b6-6624b4537354
LOCATION: francecentral
AZURE_CLUSTER_NAME: aks-kaito-rag-47ed0
AZURE_NODE_RESOURCE_GROUP: MC_rg-kaito-rag-47ed0_aks-kaito-rag-47ed0_francecentral
ARM_RESOURCE_GROUP: rg-kaito-rag-47ed0
LEADER_ELECT: false
E2E_TEST_MODE: false
AZURE_CLIENT_ID: f99ece97-9543-4263-93cf-abc904a1ee9e
AZURE_TENANT_ID: 80c...-...-...-...-...4cd
AZURE_FEDERATED_TOKEN_FILE: /var/run/secrets/azure/tokens/azure-identity-token
AZURE_AUTHORITY_HOST: https://login.microsoftonline.com/
Mounts:
/var/run/secrets/azure/tokens from azure-identity-token (ro)
/var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-5d9cl (ro)
Conditions:
Type Status
PodReadyToStartContainers True
Initialized True
Ready False
ContainersReady False
PodScheduled True
Volumes:
kube-api-access-5d9cl:
Type: Projected (a volume that contains injected data from multiple sources)
TokenExpirationSeconds: 3607
ConfigMapName: kube-root-ca.crt
ConfigMapOptional: <nil>
DownwardAPI: true
azure-identity-token:
Type: Projected (a volume that contains injected data from multiple sources)
TokenExpirationSeconds: 3600
QoS Class: Burstable
Node-Selectors: kubernetes.io/os=linux
Tolerations: CriticalAddonsOnly op=Exists
node.kubernetes.io/memory-pressure:NoSchedule op=Exists
node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Topology Spread Constraints: topology.kubernetes.io/zone:ScheduleAnyway when max skew 1 is exceeded for selector app.kubernetes.io/instance=gpu-provisioner,app.kubernetes.io/name=gpu-provisioner
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 4m51s default-scheduler Successfully assigned gpu-provisioner/gpu-provisioner-57d5c4959b-lkwkx to aks-user-38844619-vmss000000
Normal Pulling 4m51s kubelet Pulling image "mcr.microsoft.com/aks/kaito/gpu-provisioner:0.2.0"
Normal Pulled 4m49s kubelet Successfully pulled image "mcr.microsoft.com/aks/kaito/gpu-provisioner:0.2.0" in 1.113s (1.113s including waiting)
Warning BackOff 3m25s (x9 over 4m42s) kubelet Back-off restarting failed container controller in pod gpu-provisioner-57d5c4959b-lkwkx_gpu-provisioner(bc9a5c1e-ec33-4419-a6a8-a4e976f0b8f4)
Normal Created 3m11s (x5 over 4m49s) kubelet Created container controller
Normal Started 3m11s (x5 over 4m49s) kubelet Started container controller
Normal Pulled 3m11s (x4 over 4m45s) kubelet Container image "mcr.microsoft.com/aks/kaito/gpu-provisioner:0.2.0" already present on machine
I'd really appreciate any help to get this running on AKS.
Thank you.
Have you setup the workload identity? note that before finishing this step, the gpu-provisioner controller pod will constantly fail with the following message in the log....
.
Can you show the log of the gpu-provisioner pod?
Hi @Fei-Guo,
Yes, I just - a few minutes ago - were able to access the log of the gpu-provisioner
and there was an issue with the federated identity. I'm now trying to fix it.
On the other hand, could you please tell me if Phi-3 Medium is currently supported by 0.3.0
. It seems it is not 🫤
Thank you.
https://github.com/Azure/kaito/blob/66f57116fa9827a13c023d57b300928ecd2ce640/presets/models/supported_models.yaml#L115 It is supported, we should have a model image ready for phi3-medium. I found the model name is missing in the doc https://github.com/Azure/kaito/tree/main/presets/models/phi3, will fix it.
Hi @Fei-Guo,
So, finally I was able to deploy the phi-3 medium
😃
Sadly it takes forever to answer. In fact, I was unable to get an answer from it. I think I'm using a quite capable VM (the Standard_NC12s_v3
) and yet doing a curl
with a simple question as Tell me about Tuscany and its cities.
never gets a response.
On the other hand, the phi-3 mini
did work. I think I will try to adjust the prompt to get the expected response format from the phi-3 mini
model and forget about the phi-3 medium
.
Thank a lot for your help!
Hi @Fei-Guo and @ishaansehgal99,
The following is the kubectl describe pod
of pod containing a phi-3-medium
model.
The thing is that everything deploys successfully, but the model never answers a question.
This is the description:
Name: kaito-workspace-phi-3-medium-4k-instruct-5d685658b9-dzqfw
Namespace: kaito-rag
Priority: 0
Service Account: default
Node: aks-ws3e3bbb692-12520279-vmss000000/10.240.0.7
Start Time: Mon, 29 Jul 2024 19:57:18 +0200
Labels: kaito.sh/workspace=kaito-workspace-phi-3-medium-4k-instruct
pod-template-hash=5d685658b9
Annotations: <none>
Status: Running
IP: 172.0.4.208
IPs:
IP: 172.0.4.208
Controlled By: ReplicaSet/kaito-workspace-phi-3-medium-4k-instruct-5d685658b9
Containers:
kaito-workspace-phi-3-medium-4k-instruct:
Container ID: containerd://60c28966a1cfb3fa109acecd67d638aa21ae36d93a523b61b71b365f36ae25be
Image: mcr.microsoft.com/aks/kaito/kaito-phi-3-medium-4k-instruct:0.0.1
Image ID: mcr.microsoft.com/aks/kaito/kaito-phi-3-medium-4k-instruct@sha256:c106975f8e09a03d32118c7fff6f690ab705587dcdb83b2aad41d3c9ed30b740
Port: 5000/TCP
Host Port: 0/TCP
Command:
/bin/sh
-c
accelerate launch --gpu_ids=all --num_processes=1 --num_machines=1 --machine_rank=0 inference_api.py --torch_dtype=auto --pipeline=text-generation --trust_remote_code
State: Running
Started: Mon, 29 Jul 2024 20:03:14 +0200
Ready: True
Restart Count: 0
Limits:
nvidia.com/gpu: 1
Requests:
nvidia.com/gpu: 1
Liveness: http-get http://:5000/healthz delay=600s timeout=1s period=10s #success=1 #failure=3
Readiness: http-get http://:5000/healthz delay=30s timeout=1s period=10s #success=1 #failure=3
Environment: <none>
Mounts:
/var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-fm5ql (ro)
Conditions:
Type Status
PodReadyToStartContainers True
Initialized True
Ready True
ContainersReady True
PodScheduled True
Volumes:
kube-api-access-fm5ql:
Type: Projected (a volume that contains injected data from multiple sources)
TokenExpirationSeconds: 3607
ConfigMapName: kube-root-ca.crt
ConfigMapOptional: <nil>
DownwardAPI: true
QoS Class: BestEffort
Node-Selectors: <none>
Tolerations: node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
nvidia.com/gpu:NoSchedule op=Exists
sku=gpu:NoSchedule
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 44m default-scheduler Successfully assigned kaito-rag/kaito-workspace-phi-3-medium-4k-instruct-5d685658b9-dzqfw to aks-ws3e3bbb692-12520279-vmss000000
Normal Pulling 44m kubelet Pulling image "mcr.microsoft.com/aks/kaito/kaito-phi-3-medium-4k-instruct:0.0.1"
Normal Pulled 38m kubelet Successfully pulled image "mcr.microsoft.com/aks/kaito/kaito-phi-3-medium-4k-instruct:0.0.1" in 5m55.332s (5m55.332s including waiting)
Normal Created 38m kubelet Created container kaito-workspace-phi-3-medium-4k-instruct
Normal Started 38m kubelet Started container kaito-workspace-phi-3-medium-4k-instruct
To test, I'm doing a port-forward
and then the following curl
:
curl -X POST http://localhost:5000/chat -H "accept: application/json" -H "Content-Type: application/json" -d '{"prompt":"Tell me about Tuscany and its cities.", "return_full_text": false, "generate_kwargs": {"max_length":4096}}'
The VM size used for this model is a Standard_NC12s_v3
.
The logs in the pod are mostly INFO: 10.240.0.7:48044 - "GET /healthz HTTP/1.1" 200 OK
, but there are a few with something different:
The following values were not passed to `accelerate launch` and had defaults used instead:
`--mixed_precision` was set to a value of `'no'`
`--dynamo_backend` was set to a value of `'no'`
To avoid this warning pass in values for each of the problematic parameters or run `accelerate config`.
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
`flash-attention` package not found, consider installing for better performance: No module named 'flash_attn'.
Current `flash-attenton` does not support `window_size`. Either upgrade or use `attn_implementation='eager'`.
Loading checkpoint shards: 100%|██████████| 6/6 [00:03<00:00, 1.96it/s]
WARNING:root:Some parameters are on the meta device device because they were offloaded to the cpu.
INFO: Started server process [19]
INFO: Waiting for application startup.
INFO: Application startup complete.
INFO: Uvicorn running on http://0.0.0.0:5000 (Press CTRL+C to quit)
Model: Phi3ForCausalLM(
(model): Phi3Model(
(embed_tokens): Embedding(32064, 5120, padding_idx=32000)
(embed_dropout): Dropout(p=0.0, inplace=False)
(layers): ModuleList(
(0-39): 40 x Phi3DecoderLayer(
(self_attn): Phi3Attention(
(o_proj): Linear(in_features=5120, out_features=5120, bias=False)
(qkv_proj): Linear(in_features=5120, out_features=7680, bias=False)
(rotary_emb): Phi3RotaryEmbedding()
)
(mlp): Phi3MLP(
(gate_up_proj): Linear(in_features=5120, out_features=35840, bias=False)
(down_proj): Linear(in_features=17920, out_features=5120, bias=False)
(activation_fn): SiLU()
)
(input_layernorm): Phi3RMSNorm()
(resid_attn_dropout): Dropout(p=0.0, inplace=False)
(resid_mlp_dropout): Dropout(p=0.0, inplace=False)
(post_attention_layernorm): Phi3RMSNorm()
)
)
(norm): Phi3RMSNorm()
)
(lm_head): Linear(in_features=5120, out_features=32064, bias=False)
)
INFO: 10.240.0.7:59630 - "GET /healthz HTTP/1.1" 200 OK
INFO: 10.240.0.7:36058 - "GET /healthz HTTP/1.1" 200 OK
INFO: 10.240.0.7:40306 - "GET /healthz HTTP/1.1" 200 OK
...
INFO: 10.240.0.7:40798 - "GET /healthz HTTP/1.1" 200 OK
Truncation was not explicitly activated but `max_length` is provided a specific value, please use `truncation=True` to explicitly truncate examples to max length. Defaulting to 'longest_first' truncation strategy. If you encode pairs of sequences (GLUE-style) with the tokenizer you can select this strategy more precisely by providing a specific strategy to `truncation`.
WARNING:transformers_modules.weights.modeling_phi3:You are not running the flash-attention implementation, expect numerical differences.
INFO: 10.240.0.7:33918 - "GET /healthz HTTP/1.1" 200 OK
...
Why with the phi-3-mini
this works, but with the phi-3-medium
does not? 🫤
Any help is appreciated. Thank you!
Hi @rliberoff, thanks for sharing.
I recommend reducing the max length to speed up requests for the medium model. Additionally, we're adjusting the deployment specs to utilize all available GPUs, which should drastically improve inference time. We'll release this fix as soon as possible.
Hi @rliberoff, thanks for sharing.
I recommend reducing the max length to speed up requests for the medium model. Additionally, we're adjusting the deployment specs to utilize all available GPUs, which should drastically improve inference time. We'll release this fix as soon as possible.
To temporally try out the fix manually, you can edit the inference workload template and change the resource.request for GPU to 2 if you are using "Standard_NC12s_v3". You should see much better performance for inference.
Hi @Fei-Guo,
Thank you for the information. However, I'm not understanding what you mean by the inference workload template. I'm quite new on this.
Could you please provide or point to an example of one of these templates?
Thank you!
Hey guys,
Could you please tell me how can I adjust the deployment specs to utilize all available GPUs on the Standard_NC12s_v3?
Thank you!
Apologies for the delay. You can edit the deployment specification by running the following command:
kubectl edit deployment <deployment_name>
In the configuration, update the resource limits and requests from:
resources:
limits:
nvidia.com/gpu: "1"
requests:
nvidia.com/gpu: "1"
to
resources:
limits:
nvidia.com/gpu: "2"
requests:
nvidia.com/gpu: "2"
Hi @ishaansehgal99,
I'm so sorry, but I've been trying like crazy to make this work. I don't understand how can I change the values of nvidia.com/gpu
from 1
to 2
without actually cloning this repo and kind of manually doing this changes.
Is there a way to set these configurations using the kaito.sh/v1alpha1
and Workspace
CRDs? Or when deploying the Kaito workspace from the Helm Chart?
I'm trying to setup a demo that uses Terraform to deploy Kaito but using Phi-3 Medium (because Phi-3 Mini hallucinates a lot).
How can I indicate to use two GPUs?
To give you an idea this would be my current Terraform script for Kaito:
data "azurerm_subscription" "current" {
}
resource "azurerm_role_assignment" "kaito_provisioner_assigned_identity_contributor_role" {
principal_id = data.azurerm_user_assigned_identity.kaito_identity.principal_id
scope = var.aks_id
role_definition_name = "Contributor"
}
resource "kubernetes_namespace" "kaito_namespace" {
metadata {
name = var.kaito_aks_namespace
}
}
resource "azapi_update_resource" "enable_kaito" {
count = var.use_upstream_version ? 0 : 1
type = "Microsoft.ContainerService/managedClusters@2024-03-02-preview"
resource_id = var.aks_id
body = jsonencode({
properties = {
aiToolchainOperatorProfile = {
enabled = true
}
}
})
}
data "azurerm_user_assigned_identity" "kaito_identity" {
name = var.kaito_identity_name
resource_group_name = var.kaito_identity_resource_group_name
depends_on = [azapi_update_resource.enable_kaito]
}
resource "azurerm_federated_identity_credential" "kaito_federated_identity_credential" {
name = "id-federated-kaito"
resource_group_name = data.azurerm_user_assigned_identity.kaito_identity.resource_group_name
parent_id = data.azurerm_user_assigned_identity.kaito_identity.id
issuer = var.aks_oidc_issuer_url
audience = ["api://AzureADTokenExchange"]
subject = var.use_upstream_version ? "system:serviceaccount:gpu-provisioner:gpu-provisioner" : "system:serviceaccount:kube-system:kaito-gpu-provisioner"
}
resource "helm_release" "kaito_workspace" {
count = var.use_upstream_version ? 1 : 0
name = "kaito-workspace"
chart = "${path.module}/charts/kaito/workspace/"
namespace = kubernetes_namespace.kaito_namespace.metadata.0.name
create_namespace = false
}
resource "helm_release" "gpu_provisioner" {
count = var.use_upstream_version ? 1 : 0
name = "kaito-gpu-provisioner"
chart = "https://github.com/Azure/gpu-provisioner/raw/gh-pages/charts/gpu-provisioner-${var.gpu_provisioner_version}.tgz"
wait = true
set {
name = "settings.azure.clusterName"
value = var.aks_name
}
set {
name = "replicas"
value = var.gpu_provisioner_replicas
}
set {
name = "controller.env[0].name"
value = "ARM_SUBSCRIPTION_ID"
}
set {
name = "controller.env[0].value"
value = data.azurerm_subscription.current.subscription_id
}
set {
name = "controller.env[1].name"
value = "LOCATION"
}
set {
name = "controller.env[1].value"
value = var.aks_location
}
set {
name = "controller.env[2].name"
value = "AZURE_CLUSTER_NAME"
}
set {
name = "controller.env[2].value"
value = var.aks_name
}
set {
name = "controller.env[3].name"
value = "AZURE_NODE_RESOURCE_GROUP"
}
set {
name = "controller.env[3].value"
value = var.aks_node_resource_group_name
}
set {
name = "controller.env[4].name"
value = "ARM_RESOURCE_GROUP"
}
set {
name = "controller.env[4].value"
value = var.resource_group_name
}
set {
name = "controller.env[5].name"
value = "LEADER_ELECT"
}
set {
name = "controller.env[5].value"
value = "false"
type = "string" # Forcefully set the type as `string` to avoid the error: `…cannot unmarshal bool into Go struct field EnvVar.spec.template.spec.containers.env.value of type string…`
}
set {
name = "controller.env[6].name"
value = "E2E_TEST_MODE"
}
set {
name = "controller.env[6].value"
value = "false"
type = "string" # Forcefully set the type as `string` to avoid the error: `…cannot unmarshal bool into Go struct field EnvVar.spec.template.spec.containers.env.value of type string…`
}
set {
name = "workloadIdentity.clientId"
value = data.azurerm_user_assigned_identity.kaito_identity.client_id
}
set {
name = "workloadIdentity.tenantId"
value = data.azurerm_user_assigned_identity.kaito_identity.tenant_id
}
}
resource "kubectl_manifest" "kaito_ai_model" {
yaml_body = <<-EOF
apiVersion: kaito.sh/v1alpha1
kind: Workspace
metadata:
name: kaito-${var.kaito_ai_model}
namespace: ${kubernetes_namespace.kaito_namespace.metadata.0.name}
annotations:
kaito.sh/enablelb: "False"
resource:
count: 1
instanceType: "${var.kaito_instance_type_vm_size}"
labelSelector:
matchLabels:
apps: ${var.kaito_ai_model}
inference:
preset:
name: "${var.kaito_ai_model}"
EOF
depends_on = [
azapi_update_resource.enable_kaito,
helm_release.kaito_workspace,
helm_release.gpu_provisioner
]
}
resource "azurerm_network_security_rule" "kaito_ai_model_inference_network_security_rule" {
name = "rule-${var.kaito_aks_namespace}-${var.kaito_inference_port}"
priority = 100
direction = "Inbound"
access = "Allow"
protocol = "Tcp"
source_port_range = "*"
destination_port_range = 80
source_address_prefix = "Internet"
destination_address_prefix = "*"
resource_group_name = var.resource_group_name
network_security_group_name = var.network_security_group_name
}
resource "kubernetes_ingress_v1" "kaito_ai_model_inference_endpoint_ingress" {
wait_for_load_balancer = true
metadata {
name = "ingress-kaito-${var.kaito_ai_model}"
namespace = kubernetes_namespace.kaito_namespace.metadata.0.name
annotations = {
"kubernetes.io/ingress.class" = "addon-http-application-routing"
}
}
spec {
rule {
http {
path {
path = "/chat"
path_type = "Prefix"
backend {
service {
name = "kaito-${var.kaito_ai_model}"
port {
number = 80
}
}
}
}
}
}
}
}
Thank you in advance!!!
@rliberoff
You need to change the deployment
object created by the kaito controller and update the pod template there. You need to do that using kubectl against the live k8s cluster. None of the terraform scripts should be changed.
Note: this is just a hacky workaround. The code fix has been checked in and will be available in next kaito release.
Hi @Fei-Guo,
Thank you for the answer.
I guess I will try to make Phi-3 Mini work and wait for the next release.
I really appreciate your help and patience with this guys! Thank guy 😀
By the way, I’m leaving the link to the repo here in case anyone needs it or finds it interesting.
Describe the bug
I'm trying to deploy a Phi-3 model in AKS, but every time I try to deploy the workspace, I get the following error:
The YAML definition is as follows:
I'm using France Central as my Azure Region.
Please help!
Thank you!
Steps To Reproduce
I'm using Terraform to deploy the AKS. Everything is as expected, but once I execute the following command:
I get the following error immediately:
Expected behavior
N/A
Logs
N/A
Environment
kubectl version
): kubectl version --> 1.29.2cat /etc/os-release
): Windows 11Additional context
N/A