Azure / aksArc

# Welcome to the Azure Kubernetes Service enabled by Azure Arc (AKS Arc) repo This is where the AKS Arc team will track features and issues with AKS Arc. We will monitor this repo in order to engage with our community and discuss questions, customer scenarios, or feature requests. Checkout our projects tab to see the roadmap for AKS Arc!
MIT License
111 stars 45 forks source link

[BUG] Get-AksHciConfig and Install-AksHci cannot find cloudconfig in the remote host #47

Closed erjosito closed 3 years ago

erjosito commented 4 years ago

Describe the bug No cloudconfig file can be found

To Reproduce Not sure. I am deploying in Azure in a 2 node HCI cluster. I initialized the nodes and used Set-AksHciConfig, no difference

Expected behavior Install-AksHciConfig and Get-AksHciConfig working

Screenshots

PS C:\Users\labadmin> set-akshciconfig
[10/20/2020 02:12:42] Creating configuration
 - Removing old configuration...
 - New configuration has been saved
PS C:\Users\labadmin> get-akshciconfig
[10/20/2020 02:07:27] Checking for configuration
 - Merging Windows Admin Center configuration
 - Loading Windows Admin Center configuration from 'C:\Users\labadmin\Windows Admin Center\aks-hci-settings.json'...
 - Processing configuration...
[10/20/2020 02:07:27] Creating configuration
 - Removing old configuration...
 - New configuration has been saved
[10/20/2020 02:07:27] Reading configuration
[10/20/2020 02:07:28] Validating configuration
[10/20/2020 02:07:28] Confirming Configuration
[10/20/2020 02:07:28] Determining deployment type
 - This is a multi-node deployment using failover cluster: AZSHCI
[10/20/2020 02:07:28] Verifying cloudconfig access file
 - Retrieving access file from WIN-MUCC37Q1OIO
Copy-Item : Cannot find path '\\WIN-MUCC37Q1OIO\C$\Users\labadmin\.wssd\cloudconfig' because it does not exist.
At C:\Program Files\WindowsPowerShell\Modules\Moc\0.2.8\Common.psm1:946 char:5
+     Copy-Item -Path $remotePath -Destination $destination
+     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    + CategoryInfo          : ObjectNotFound: (\\WIN-MUCC37Q1O...ssd\cloudconfig:String) [Copy-Item], ItemNotFoundException
    + FullyQualifiedErrorId : PathNotFound,Microsoft.PowerShell.Commands.CopyItemCommand
Unable to locate a valid cloudconfig access file.
At C:\Program Files\WindowsPowerShell\Modules\Moc\0.2.8\Common.psm1:1437 char:5
+     throw "Unable to locate a valid cloudconfig access file."
+     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    + CategoryInfo          : OperationStopped: (Unable to locat...ig access file.:String) [], RuntimeException
    + FullyQualifiedErrorId : Unable to locate a valid cloudconfig access file.

Environment (please complete the following information):

Additional context Add any other context about the problem here.

Collect log files

PS C:\Users\labadmin> get-akshcilogs
[10/20/2020 02:15:02] Checking for configuration
 - Merging Windows Admin Center configuration
 - Loading Windows Admin Center configuration from 'C:\Users\labadmin\Windows Admin Center\aks-hci-settings.json'...
 - Processing configuration...
[10/20/2020 02:15:02] Creating configuration
 - Removing old configuration...
 - New configuration has been saved
[10/20/2020 02:15:02] Reading configuration
[10/20/2020 02:15:02] Validating configuration
[10/20/2020 02:15:02] Confirming Configuration
[10/20/2020 02:15:02] Determining deployment type
 - This is a multi-node deployment using failover cluster: AZSHCI
[10/20/2020 02:15:02] Verifying cloudconfig access file
 - Retrieving access file from WIN-PSIERIALC37
Copy-Item : Cannot find path '\\WIN-PSIERIALC37\C$\Users\labadmin\.wssd\cloudconfig' because it does not exist.
At C:\Program Files\WindowsPowerShell\Modules\AksHci\0.2.8\Common.psm1:946 char:5
+     Copy-Item -Path $remotePath -Destination $destination
+     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    + CategoryInfo          : ObjectNotFound: (\\WIN-PSIERIALC...ssd\cloudconfig:String) [Copy-Item], ItemNotFoundException
    + FullyQualifiedErrorId : PathNotFound,Microsoft.PowerShell.Commands.CopyItemCommand
Unable to locate a valid cloudconfig access file.
At C:\Program Files\WindowsPowerShell\Modules\AksHci\0.2.8\Common.psm1:1437 char:5
+     throw "Unable to locate a valid cloudconfig access file."
+     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    + CategoryInfo          : OperationStopped: (Unable to locat...ig access file.:String) [], RuntimeException
    + FullyQualifiedErrorId : Unable to locate a valid cloudconfig access file.
PS C:\Users\labadmin> get-module -name akshci
ModuleType Version    Name                                ExportedCommands
---------- -------    ----                                ----------------
Script     0.2.8      AksHci                              {Get-AksHciCluster, Get-AksHciCredential, Get-AksHciKubernetesVersion, Get-AksHciLogs...}

Not sure where to find Get-SMEUILogs.ps1, it is not in the preview package I downloaded

abhilashaagarwala commented 4 years ago

Are you deploying this on Azure VMs?

erjosito commented 4 years ago

@abhilashaagarwala Yes

erjosito commented 3 years ago

Any idea what can I troubleshoot or try? Thanks!

nwoodmsft commented 3 years ago

You mentioned that you are attempting to install AksHci, but the script is indicating that you still have Windows Admin Center configuration present (which is a sign that Windows Admin Center was used to deploy AksHci on this cluster/node previously and no uninstall has been performed yet):

- Loading Windows Admin Center configuration from 'C:\Users\labadmin\Windows Admin Center\aks-hci-settings.json'...

Did your previous Windows Admin Center deployment complete successfully? I am wondering if you had a failed deployment through WAC and some aspects of that deployment are still present and have not been cleaned up yet.

We are improving the powershell "Uninstall-AksHci" cmdlet to be more resilient to this type of issue/state moving forward. For now, you may need to perform some manual cleanup on the cluster/nodes if you are planning to switch to using the Powershell for Day0 deployment of AksHci.

erjosito commented 3 years ago

Hey @nwoodmsft thanks for your answer! Yes, first I tried WAC, but after 2h it didnt work. After retrying with WAC a couple of times I switched over to pwsh. Would you have a link on how to "perform some manual cleanup on the cluster/nodes" to switch to pwsh from WAC?

erjosito commented 3 years ago

I renamed C:\users\labadmin\Windows Admin Center\aks-hci-settings.json and issued set-akshciconfig in each node, and install-akshci in one of the nodes. There was an error when removing the cloudagent directories, not sure whether this is going to affect the process...

- Removing cloudagent directory on WIN-PSIERIALC37...
Access is denied
    + CategoryInfo          : NotSpecified: (:) [Remove-Item], Win32Exception
    + FullyQualifiedErrorId : System.ComponentModel.Win32Exception,Microsoft.PowerShell.Commands.RemoveItemCommand
    + PSComputerName        : WIN-PSIERIALC37
erjosito commented 3 years ago

I think we can close this one, since deleting/renaming C:\users\labadmin\Windows Admin Center\aks-hci-settings.json made the trick :-)