Azure / AKS-Edge

Welcome to the Azure Kubernetes Service (AKS) Edge repo.
MIT License
53 stars 34 forks source link

Arc connection issue - Exception occurred while connecting to cluster - Exception calling "Add" with "2" argument(s): "An item with the same key has already been added." #183

Closed see-dz closed 3 weeks ago

see-dz commented 1 month ago

I have tried Arc connecting AKS on several different servers brand new images and I receive the same error on all of them. HAs anyone seen this error when connecting to ARC and what was your resolution? Exception occurred while connecting to cluster - Exception calling "Add" with "2" argument(s): "An item with the same key has already been added." My Linux node setting below "LinuxNode": { "CpuCount": 6, "MemoryInMB": 8192, "DataSizeInGB": 25, "LogSizeInGB": 1, "TimeoutSeconds": 300, "TpmPassthrough": false, This is what the log shows AksEdge - Connecting cluster to Azure Arc failed: Exception occurred while connecting to cluster - Exception calling "Add" with "2" argument(s): "An item with the same key has already been added."_x000A_at Test-ClusterConnectivity, C:\Program Files\WindowsPowerShell\Modules\AksEdge\1.6.384.0\AksEdge.psm1: line 3326_x000Dx000A_at Set-ArcConnectionInternal, C:\Program Files\WindowsPowerShell\Modules\AksEdge\1.6.384.0\AksEdge.psm1: line 3212_x000Dx000A_at Connect-AksEdgeArc, C:\Program Files\WindowsPowerShell\Modules\AksEdge\1.6.384.0\AksEdge.psm1: line 2662_x000D__x000A_at , : line 1

nelsonmorais commented 1 month ago

I have the same issue.

ChinthapalliNikhithaChandana commented 3 weeks ago

Facing the same issue.

SummerSmith commented 1 week ago

To resolve this issue, redeploy using the July 2024 update, or try the following work around:

Confirm the permissions assigned for the ServicePrincipal - Ensure the ServicePrincipal has "Kubernetes Cluster - Azure Arc Onboarding" & "Contributor" roles at ResourceGroup (or Subscription) level. If the above is confirmed, then the following workaround:

Replace "xyz" in the instructions below with corresponding values

cp 'C:\Program Files\WindowsPowerShell\Modules\AksEdge\1.6.384.0\AksEdge*' C:\Users\xyz\desktop\AksEdge\

Open C:\Users\USERNAME\desktop\AksEdge\AksEdge.psm1 and replace Test-ClusterConnectivity with below implementation:

function Test-ClusterConnectivity { param ( [Parameter(Mandatory)] [System.Object]$ConnectClusterArgs ) try {

increasing wait time for cluster connectivity check to an hour

    $res = $false
    $clusterArgs = @{
        ClusterName = $ConnectClusterArgs['ClusterName']
        ResourceGroupName = $ConnectClusterArgs['ResourceGroupName']
    }
        Write-SubStatus "Connecting cluster to Azure, try count : $retries"
        # [Bug]: [ArcK8s] Arc fails to connect even after a couple of retires (https://dev.azure.com/msazure/msk8s/_workitems/edit/25630601/)
        # Sometimes the connectivity status from the New-AzConnectedKubernetes indicate `Connecting` which could result in incorrect/unnecessary retry.
        # Additionally, higher back off interval of 5 min and lack of logs between reties result in a behavior which resembles a hang up screen.
        # Fixing this by using onboarding timeout parameter supported by New-AzConnectedKubernetes command to 10 min for cluster to connect(typically the connect commands succeeds under 2 min).
        # Log between the retry attempts to user, retry in a determinitic way and exit the connect command if we are unable to connect to Azure under an hour ([ 10 min (onboarding timeout) + 1 min (backoff timeout)] * 5 (retries) = 55 min)
        New-AzConnectedKubernetes -OnboardingTimeout 600 -ProvisioningState 'Succeeded' @ConnectClusterArgs 2>&1 | Out-File -FilePath $ArcConnectionLocation -Append

        for($retries = 1; $retries -le 20; $retries++)
        {
            $arcConnected = Get-AzConnectedKubernetes @clusterArgs -ErrorAction SilentlyContinue 2>&1

            if($arcConnected.ConnectivityStatus -eq "Connected")
            {
                Write-Substatus "Cluster reached connected status"
                $res = $true
                break
            }

            Write-SubStatus "Retrying in 1 minute.."
            Start-Sleep -Seconds 15
        }
    return $res
}
catch
{
    $err = $_.Exception.Message.ToString()
    $msg = "Exception occurred while connecting to cluster - $err"
    Write-Status $msg -color Red
    throw $msg
}

}

Save file

Remove-Module AksEdge; Import-Module "C:\users\xyz\Desktop\AksEdge\AksEdge.psm1" -Force

Suggest try a new deployment with a different cluster name: Remove-AksEdgeDeployment -Force New-AksEdgeDeployment...