Azure / aksArc

# Welcome to the Azure Kubernetes Service on Azure Stack HCI repo This is where the AKS-HCI team will track features and issues with AKS-HCI. We will monitor this repo in order to engage with our community and discuss questions, customer scenarios, or feature requests. Checkout our projects tab to see the roadmap for AKS-HCI!
MIT License
109 stars 45 forks source link

[BUG] kubectl logs return "error: You must be logged in to the server (the server has asked for the client to provide credentials)" #285

Closed baziwane closed 1 year ago

baziwane commented 1 year ago

There is an issue with AKS hybrid in which a cluster can stop returning logs. When this happens, running kubectl logs returns "error: You must be logged in to the server (the server has asked for the client to provide credentials)". AKS hybrid rotates core Kubernetes certificates every 4 days, but sometimes the Kubernetes API server doesn't immediately reload its client certificate for communication with kubelet when the certificates update.

Root cause This issue is caused by this known bug in upstream Kubernetes issue #114588 with PR #115 to resolve.

Mitigation

To mitigate the issue, there are several options:

Rerun kubectl logs. For example, run the following PowerShell command:

while (1) {kubectl logs <POD_NAME>; sleep 1}

Restart the kube-apiserver container on each of the control planes for a cluster. Restarting the API server does not impact running workloads. To restart the API server, follow these steps:

Get the IP addresses for each control plane in your cluster:

kubectl get nodes -o wide

Run the following command:

ssh -i (get-akshciconfig).Moc.sshPrivateKey clouduser@<CONTROL_PLANE_IP> 'sudo crictl stop $(sudo crictl ps --name kube-apiserver -o json | jq -r .containers[0].id)'

Optionally, but not recommended for production workloads, you can ask kube-apiserver not to verify the server certificate of the kubelet:

kubectl logs <POD_NAME> --insecure-skip-tls-verify-backend=true
plumbery commented 1 year ago

This incident appeared after upgrade worker cluster kubernetes version on my cluster. This workaround worked for me.

ssh -i (get-akshciconfig).Moc.sshPrivateKey clouduser@ 'sudo crictl stop $(sudo crictl ps --name kube-apiserver -o json | jq -r .containers[0].id)'

Thanks.