Ignore Failed Graceful Shutdown not working

thomasLeclaire commented 3 weeks ago

Describe the bug Sounds the ignore failed graceful shutdown feature not working correctly since last versions. It was fine before before 0.9

Sounds some consequences of refactoring done in https://github.com/abahmed/kwatch/pull/280 in particular https://github.com/abahmed/kwatch/blob/main/filter/containerKillingFilter.go

To Reproduce Scale down some deployment of app unable to stop in the allowed grace period.

Expected behavior No alert if pods killed after grace period of a normal cluster behavior (scaling, rearrangement,..)

Actual behavior

kubelet log :

I0621 09:54:32.178488    1975 kuberuntime_container.go:742] "Killing container with a grace period" pod="google-sync-app-master/hutch-65d56f4988-9vw5f" podUID=1caade92-2bb7-4542-a0e0-acdee0df6c47 containerName="hutch-container" containerID="containerd://4031df8df9362d4df69ace1af956eb7430fa0b4e819f4059c54c790e28a2bd61" gracePeriod=30

kwatch log triggering notif :

{"level":"info", "msg":"sending event: {PodName:hutch-65d56f4988-9vw5f ContainerName:hutch-container Namespace:google-sync-app-master Reason:Error Events:[2024-06-21 09:54:32 +0000 UTC] Killing Stopping container hutch-container Logs:Docker Starting hutch in hutch-start.sh
ENVKEY_ENV: preprod
PING rabbitmq.rabbitmq.svc.cluster.local:15672
RabbitMQ is UP!
2024-06-19T08:30:08Z 19 INFO -- writing pid in /home/effilab/tmp/hutch.pid
2024-06-19T08:30:08Z 19 INFO -- hutch booted with pid 19
2024-06-19T08:30:08Z 19 INFO -- found rails project (.), booting app in preprod environment
Labels:map[app:google-sync-app-master pod-template-hash:65d56f4988 role:hutch]}"}

Version/Commit All fine before 0.9 Notification not triggered with the 0.9 but could be a consequence of others bug fixed in subsequent releases like 0.9 logs give these sorts of logs : {"level":"info","msg":"container only issue nginx tag-xy-6ff64687c7-zsmb4 tag-xy-6ff64687c7 Error 137","time":"2024-06-21T14:53:19Z"}

alexremn commented 5 days ago

@abahmed good day! Are you have plans on fixing it? Lots of false positive messages are coming.

abahmed commented 5 days ago

@alexremn Yes, working on a fix and it should be landed in next few days

abahmed / kwatch

Ignore Failed Graceful Shutdown not working #323