abahmed / kwatch

:eyes: monitor & detect crashes in your Kubernetes(K8s) cluster instantly
https://kwatch.dev
MIT License
944 stars 75 forks source link

Ignore Failed Graceful Shutdown not working #323

Open thomasLeclaire opened 3 weeks ago

thomasLeclaire commented 3 weeks ago

Describe the bug Sounds the ignore failed graceful shutdown feature not working correctly since last versions. It was fine before before 0.9

Sounds some consequences of refactoring done in https://github.com/abahmed/kwatch/pull/280 in particular https://github.com/abahmed/kwatch/blob/main/filter/containerKillingFilter.go

To Reproduce Scale down some deployment of app unable to stop in the allowed grace period.

Expected behavior No alert if pods killed after grace period of a normal cluster behavior (scaling, rearrangement,..)

Actual behavior

I0621 09:54:32.178488    1975 kuberuntime_container.go:742] "Killing container with a grace period" pod="google-sync-app-master/hutch-65d56f4988-9vw5f" podUID=1caade92-2bb7-4542-a0e0-acdee0df6c47 containerName="hutch-container" containerID="containerd://4031df8df9362d4df69ace1af956eb7430fa0b4e819f4059c54c790e28a2bd61" gracePeriod=30

Version/Commit All fine before 0.9 Notification not triggered with the 0.9 but could be a consequence of others bug fixed in subsequent releases like 0.9 logs give these sorts of logs : {"level":"info","msg":"container only issue nginx tag-xy-6ff64687c7-zsmb4 tag-xy-6ff64687c7 Error 137","time":"2024-06-21T14:53:19Z"}

alexremn commented 5 days ago

@abahmed good day! Are you have plans on fixing it? Lots of false positive messages are coming.

abahmed commented 5 days ago

@alexremn Yes, working on a fix and it should be landed in next few days