Azure / aksArc

# Welcome to the Azure Kubernetes Service on Azure Stack HCI repo This is where the AKS-HCI team will track features and issues with AKS-HCI. We will monitor this repo in order to engage with our community and discuss questions, customer scenarios, or feature requests. Checkout our projects tab to see the roadmap for AKS-HCI!
MIT License
109 stars 45 forks source link

[BUG] Set-AksHciConfig Crashes After Two+ Hours #354

Closed SantosVictorero closed 9 months ago

SantosVictorero commented 9 months ago

When running Set-AksHciConfig it runs for over two hours:

image

then it displays this error:

image

I have been using AksHci/AksHybrid for a while and never have seen this error.

Expected behavior AksHci configuration.

Environment (please complete the following information):

I also noticed that there are several errors related to FailoverClustering-Client in the System Logs: LogExtendedErrorInformation (974): Extended RPC error information: ProcessID is 2624 System time is: 55/473/7847 584:0:6176:32037 Generating component is 2 Status is 1753 Detection location is 501 Flags is 0 NumberOfParameters is 4 Unicode string: ncacn_ip_tcp Unicode string: CBSI-AKSHCI-SRV Long val: -1182943054 Long val: 382312662

It look like it was installed by Initialize-AksHciNode!? (I am just running one node)

Get-AksHciLogs returns the following error:

image

Elektronenvolt commented 9 months ago

I've noticed that validation checks had been improved with the last release(s). They discovered an IP range issue at one of my setups. In your setup, you can reach a VM 172.16.10.1 from your host? Any Firewall between host vlan and AKS Hybrid vlan? Meanwhile the validation checks create a VM and check connectivity, helps a lot to avoid troubles after setup. I see such error messages in test setups where I have no routing from the host IP to the configured IP ranges for the AKS Hybrid nodepool.

SantosVictorero commented 9 months ago

Thanks for your comments @Elektronenvolt,

I did not check that since it was working fine in previous versions, with the same configuration. I will check if there are any new firewall ports that need to be open.

:+1:

SantosVictorero commented 9 months ago

I found the problem, somehow the Virtual Switch of the Virtual Machine got corrupted. After I deleted and recreated the Virtual Switch, the installation proceeded successfully.