Azure / AML-Kubernetes

AzureML customer managed k8s compute samples
MIT License
80 stars 32 forks source link

Error while deploying extension with Terraform #285

Open BzSpi opened 1 year ago

BzSpi commented 1 year ago

Hello,

While deploying with Terraform, I have the following error

Helm installation failed : Unable to render the helm chart and substitue helm values to get a valid yaml : Recommendation Please check if the config settings provided are valid : InnerError [failed to install CRD crds/prometheus-crd.yaml: customresourcedefinitions.apiextensions.k8s.io is forbidden: User "system:serviceaccount:kube-system:ext-installer-azureml-extension" cannot create resource "customresourcedefinitions" in API group "apiextensions.k8s.io" at the cluster scope]

Here's a part of the Terraform code:

data "azurerm_key_vault_certificate_data" "aml" {
  key_vault_id = var.keyvault_id
  name         = var.machine_learning_extension.ssl_keyvault_certificate_name
}

resource "azurerm_resource_provider_registration" "kubernetes_configuration_registration" {
  name = "Microsoft.KubernetesConfiguration"
}

resource "azurerm_resource_provider_registration" "extension_manager_registration" {
  count = var.providers_registration_enabled ? 1 : 0

  name = "Microsoft.ContainerService"

  feature {
    name       = "AKS-ExtensionManager"
    registered = true
  }
}

resource "azurerm_kubernetes_cluster_extension" "machine_learning" {
  name           = "azureml-extension"
  cluster_id     = azurerm_kubernetes_cluster.aks.id
  extension_type = "Microsoft.AzureML.Kubernetes"

  configuration_settings = {
    enableTraining               = true
    enableInference              = true
    inferenceRouterServiceType   = "loadBalancer"
    allowInsecureConnections     = false
    internalLoadBalancerProvider = "azure"
    privateEndpointILB           = true
    sslCname                     = var.machine_learning_extension.endpoint_fqdn
  }

  configuration_protected_settings = {
    sslKey  = data.azurerm_key_vault_certificate_data.aml.key
    sslCert = data.azurerm_key_vault_certificate_data.aml.pem
  }

  depends_on = [azurerm_resource_provider_registration.kubernetes_configuration_registration, azurerm_resource_provider_registration.extension_manager_registration]
}

Deployment is made with a Service Principal.

BzSpi commented 1 year ago

The issue is that the extension must be deployed with the azureml namespace, default value does not work.

anwojcie commented 1 year ago

I do not think so.... You are missing a couple of needed configuration settings and also use some not existent (privateEndpointILB) check https://github.com/Azure/AML-Kubernetes/blob/master/files/terraform-template.tf and https://github.com/Azure/AML-Kubernetes/blob/master/docs/deploy-extension.md#review-azureml-deployment-configuration-settings

Also, reconsider using "Microsoft.AzureML.Kubernetes". https://github.com/MicrosoftDocs/azure-docs/issues/110642

BzSpi commented 1 year ago

Hi @anwojcie

Thanks for your response. It may be a behavior from the azurerm_kubernetes_cluster_extension resource which use the extension name as namespace.

Also, I've added a fully working code on how to deploy a private LB in issue #284