Closed jayunit100 closed 2 years ago
1008 19:07:47.451605 1 log.go:172] http: TLS handshake error from 172.17.0.2:61958: remote error: tls: bad certificate
So i guess the way this happens is related to certs
is there a way that, if antrea apiserver is down, it could try not to mess w the way the k8s apiserver is behaving ?
These APIServices are served by the Antrea Controller. Is there any issue with your Controller deployment or any connectivity issue between your K8s apiserver and the Controller Pod?
That's a property of APIServices, and there is not much that can be done. When a namespace is deleted, K8s contacts all APIServices to check if any resource needs to be deleted. If an APIService is not available, the namespace deletion gets stuck. You have countless issues all over the internet of this happening because of the metrics server.
BTW, one fix (if you want not to recover kube-apiserver to Antrea Controller connectivity first) is to delete the Antrea APIServices.
This issue is stale because it has been open 180 days with no activity. Remove stale label or comment, or this will be closed in 180 days
@antoninbas - I think the use of API aggregation introduces fragility by increasing the blast radius of failure / misconfiguration, especially in GitOps environments where it could cause the very updates to fix a broken cluster to fail due api resource listing breaking in controllers like Flux
Why not just convert the APIService aggregations to direct calls to the controller?
@moshloop Thanks for the feedback. I tend to agree with your statement. IIRC the main reason why we chose to use aggregation in the first place was to allow easy access to resource URLs from the antctl command-line tool, without having to worry about endpoint discovery / authentication. Recently we have added more commands to antctl which access non-resource URLs in the controller and for which API aggregation doesn't help. We are planning to refactor the antctl framework to better support such commands and it is probably a good time to consider removing the dependency on Antrea API aggregation altogether. Related discussion: https://github.com/vmware-tanzu/antrea/pull/2082#discussion_r617899358
This issue is stale because it has been open 180 days with no activity. Remove stale label or comment, or this will be closed in 180 days
Describe the bug
It seems like basic operations like
kubectl delete ns blah
fail on antrea namespaces on the latest ubuntu image.To Reproduce
Create an antrea kind cluster, run a conformance test, and delete the namespace
Expected
The antrea apiserver would never fail to lookup things , or if so, it would swallow the error so as not to block k8s apiserver from functioning for basic operations.
Actual behavior
antrea apiserver causes the k8s apiserver to fail bc of a resource lookup operation
Version
antrea-ubuntu:latest