Open axel7083 opened 3 days ago
After some investigation, I run podman-desktop in debugger to check when reachable
was set to true
And here is the trace
Inside informer.on connect
listener we are getting an undefined err
Here is the stack of the debugger
anonymous(), contexts-manager.ts:898
Async call from Timeout
setReachable(), contexts-manager.ts:892
setReachableDelay(), contexts-manager.ts:876
anonymous(), contexts-manager.ts:862
restartInformer(), contexts-manager.ts:925
createInformer(), contexts-manager.ts:866
createPodInformer(), contexts-manager.ts:383
createKubeContextInformers(), contexts-manager.ts:309
update(), contexts-manager.ts:238
refresh(), kubernetes-client.ts:499
Async call from await
anonymous(), contexts-manager.ts:862
restartInformer(), contexts-manager.ts:925
createInformer(), contexts-manager.ts:866
createPodInformer(), contexts-manager.ts:383
createKubeContextInformers(), contexts-manager.ts:309
update(), contexts-manager.ts:238
refresh(), kubernetes-client.ts:499
Async call from await
I am not familiar with the kubernetes npm package, but maybe they do not through an error when we force them to start
?
As far as I can remember, we never get the connection failure with the connect
event, but with the error
event only. We are setting to reachable
when trying to connect, and unreachable
when an error occurs. There is no other way, as we are not getting an event when we are effectively connected (except some ADDED events, but only if there are resources in the context).
I'm not sure to understand the output of your kubectl get pods
commands, are you getting the error immediately, or after 30s?
When the error happens immediately after the connect (which is the case with kind or a cluster whose machine is accessible), the reachable
status is overriden immediately with the status is set in error
, and we cannot see it. But if the error comes after 30s, the cluster will be seen as reachable for 30s.
More info at: https://github.com/containers/podman-desktop/issues/7629
When we call informer.start()
we will never receive an error,
the start method call the doneHandler
with a null value
then send an undefined to all connect
listener
From my understanding, the problem is the following
They send an event to the connect with no error before trying the listFn
function, which would timeout, meaning we should probably not set the cluster reachable from inside the connect listener
Here is a schema of what is happening
sequenceDiagram
Context-Manager-->>Informer: register connect listener
Context-Manager-->>Informer: register error listener
loop Forever
Context-Manager->>Informer: start()
Informer->>Context-Manager: call connect listener (no error)
Context-Manager-->>Context-Manager: set reachable true
Note right of Informer: a few seconds later
Informer->>Context-Manager: call error listener (timeout)
Context-Manager-->>Context-Manager: set reachable false
end
To also give a better illustration, here is a accelerated video of what is happening visually
https://github.com/user-attachments/assets/6823a258-e27b-42b2-85e7-c4cf444e7be0
When we call
informer.start()
we will never receive an error,the start method call the
doneHandler
with a null valuethen send an undefined to all
connect
listenerFrom my understanding, the problem is the following
They send an event to the connect with no error before trying the
listFn
function, which would timeout, meaning we should probably not set the cluster reachable from inside the connect listener
Yes, this is what I wanted to explain.
The problem is that we never receive an event that we are effectively connected. The only way would be to say that we are connected after some timeout and if we did not receive an error (or if we receive ADDED events, but this does not happen on context where there is not pod). Or to use a direct HTTP request like in #7629 , where we would get an aknowledge of the connection
I don't think we can reasonably ask to make changes on the informer behaviour. This implementation is based on the Go implementation and they try to keep them in sync, and I'm pretty sure they are happy with the current behaviour. The best change we could do IMHO would be to check connectivity with a simple HTTP request (#7629, or some get version
request)
thanks @feloy for the explanations and details 👍
Keeping this open, as it is a problem on its own, but should be resolved when https://github.com/containers/podman-desktop/issues/7629 is implemented
Bug description
I have an OpenShift cluster behind a VPN, meaning the cluster cannot be reached when the VPN is not connected, the error is not traditional when using
kubectl
However inside Podman Desktop (current main is https://github.com/containers/podman-desktop/commit/6eb2c161345cc19f7868157ae65062ad8bcbbba4) I am seeing the following
Obviously nothing is visible in the Kubernetes pages, because the cluster is not reachable
Operating system
Windows 11
Installation Method
Other
Version
next (development version)
Steps to reproduce
No response
Relevant log output
main ↪️ Trying to watch deployments on the kubernetes context named ":6443/astefani" but got a connection refused, retrying the connection in 1s. FetchError: request to https://:6443/apis/apps/v1/namespaces/rhoai-internal--astefani-nb/deployments failed, reason: )
...
main ↪️ Error while fetching API groups: FetchError: request to https://:6443/apis failed, reason:
Additional context
No response