Azure / azure-iot-operations

The official repo for Azure IoT Operations.
MIT License
25 stars 21 forks source link

[bug] HostResourceRegionMismatch while deploying Azure IoT Operations #30

Closed derbl4ck closed 7 months ago

derbl4ck commented 9 months ago

Describe the bug While deploying Azure IoT Operations in westeurope, the deployment fails with the latest CLI-Extension as well as via the Azure Portal. It seems like the location name has changed since westeurope becomes west%20europe and therefore is not being recogniced by some API Endpoints.

Errors

Target: /subscriptions/xxxxxx/resourceGroups/rg-xxxxxx/providers/Microsoft.Resources/deployments/aziotops.init.e436b719fd1b4008964f5821342f298e
Exception Details:      (HostResourceRegionMismatch) Host resource region: west europe for ID: "/subscriptions/xxxxxx/resourcegroups/rg-xxxxxx/providers/microsoft.extendedlocation/customlocations/sro-aksee001-cl" does not match Custom Location region: westeurope
        Code: HostResourceRegionMismatch
        Message: Host resource region: west europe for ID: "/subscriptions/xxxxxx/resourcegroups/rg-xxxxxx/providers/microsoft.extendedlocation/customlocations/sro-aksee001-cl" does not match Custom Location region: westeurope
        Target: hostResourceId

Same error if you try to add the custom-location feature directly:

az connectedk8s enable-features -n sro-aksee001 -g rg-xxxxxx --features cluster-connect custom-locations
Error while fetching helm chart registry path: HTTPSConnectionPool(host='west%20europe.dp.kubernetesconfiguration.azure.com', port=443): Max retries exceeded with url: /azure-arc-k8sagents/GetLatestHelmPackagePath?api-version=2019-11-01-preview&releaseTrain=stable (Caused by NewConnectionError('<urllib3.connection.HTTPSConnection object at 0x08055A10>: Failed to establish a new connection: [Errno 11001] getaddrinfo failed'))

After manipulating the ARM-Template to use the west%20europe version of the location name, the deployment goes to the next step and will fail with "Microsoft.ExtendedLocation" resource provider does not have the required permissions to create a namespace on the cluster..

Cluster Environment

Developer Environment

chgennar commented 9 months ago

@derbl4ck, can you provide us the azure cli command that you ran?

derbl4ck commented 9 months ago

I used the following cli command:

az iot ops init --subscription xxxxxx -g rg-xxxxxx --cluster xxxxxx --kv-id /subscriptions/xxxxxx/resourceGroups/rg-xxxxxx/providers/Microsoft.KeyVault/vaults/kv-xxxxxx --custom-location xxxxxx-cl --target xxxxxx-target --dp-instance xxxxxx-processor --mq-instance mq-instance --mq-mode auto --mq-mem-profile low

In the meantime I also found the root cause: The Engineer who created the AKS EE Instance used "West Europe" as location instead of "westeurope" inside of his aksedge-config.json:

{
  "SchemaVersion": "1.9",
  "Version": "1.0",
  "DeploymentType": "SingleMachineCluster",
  "Init": {
    "ServiceIPRangeSize": 100
  },
  "Arc": {
    "ClusterName": "xxxxxx",
    "Location": "West Europe",
...
}

I think the AKS EE Team should put some validation on that value, since other services propably do not check this other kind of name. @chgennar Could you check if we should put some kind of validation / fallback inside the IoT Operations ARM-Templates too?

chgennar commented 9 months ago

Thank you for the debugging information. I'll forward this to the AKS-EE team.

SummerSmith commented 7 months ago

Thanks for sharing this issue! In the latest AKS EE release, we added validation for the custom location value in the config json (called out in the release notes here).