datreeio / admission-webhook-datree

Datree offers cluster integration that allows you to validate your resources against your configured policy upon pushing them into a cluster, by using an admission webhook.
https://datree.io/
Apache License 2.0
24 stars 10 forks source link

Webhook failed on node restart #62

Closed atropos112 closed 2 years ago

atropos112 commented 2 years ago

Running a home setup (a hobby) using k3s on 3 nodes. after installing admission-webhook-datree on 2 nodes (free tier) using

bash <(curl https://get.datree.io/admission-webhook)

Everything worked ok. Few hours later I had to restart a node (for unrelated reasons) and this is where I've noticed worrying behaviour, namely the node was connecting for a brief moment and then dropping out again, repeating this few times before giving up and killing the k3s process.

I have decided to restart other nodes as well and all of the nodes exhibited the same behaviour, eventually I saw in logs

level=error msg="Failed to connect to proxy" error="websocket: bad handshake"

and decided to try uninstall datree admission webhook (during that moment when the nodes are connected), this resolved my problem. I am wondering what is it that I did wrong that caused such big issue, surely the webhook should not by any means intervene with node coming back up.

myishay commented 2 years ago

hmm that is interesting.. what k3s version are you using? did you have any special configurations or resources installed on the cluster?

atropos112 commented 2 years ago

Hello, I have version 1.23.4k3s1, no special configuration I don't think. This appears to be proxy related and I am running Kube VIP but the settings there are pretty standard.

myishay commented 2 years ago

Hi @atropos112

I was trying to reproduce this issue:

  1. I created a local k3s cluster using k3d with 3 nodes
  2. I installed an example app following this tutorial
  3. I installed latest datree admission webhook
  4. ran a few kubectl apply of any relevant resources and datree worked as expected - blocked the resources that didn't follow the default policy
  5. then I was trying to restart nodes and this also worked fine

After that, a quick search of the error log took me to this issue and to a comment that could be relevant: https://github.com/rancher/rancher/issues/20651#issuecomment-515653801 Could it be related to that issue?

If not, do you have a way to reproduce this?

eyarz commented 2 years ago

@atropos112 did you have a chance to check it? We want to know if we can close this issue.

atropos112 commented 2 years ago

Hello, I am sorry for delaying I was on holidays and didn't have access to my cluster to try. What appears to have caused the issue was the kube-vip pods were crashlooping, I came to conclusion that it was likely to be Datree because once I deleted it the crashlooping has stopped and everything worked. But I have now tried it again and they are not crashlooping suggesting that I was in fact incorrect, I am clueless as to why at the time removing datree admission webhook has changed anything. Thank you for checking ! I'll close because the problem seems to be elsewhere thank you for your investigation !