To Reproduce
Scale down some deployment of app unable to stop in the allowed grace period.
Expected behavior
No alert if pods killed after grace period of a normal cluster behavior (scaling, rearrangement,..)
Actual behavior
kubelet log :
I0621 09:54:32.178488 1975 kuberuntime_container.go:742] "Killing container with a grace period" pod="google-sync-app-master/hutch-65d56f4988-9vw5f" podUID=1caade92-2bb7-4542-a0e0-acdee0df6c47 containerName="hutch-container" containerID="containerd://4031df8df9362d4df69ace1af956eb7430fa0b4e819f4059c54c790e28a2bd61" gracePeriod=30
kwatch log triggering notif :
{"level":"info", "msg":"sending event: {PodName:hutch-65d56f4988-9vw5f ContainerName:hutch-container Namespace:google-sync-app-master Reason:Error Events:[2024-06-21 09:54:32 +0000 UTC] Killing Stopping container hutch-container Logs:Docker Starting hutch in hutch-start.sh
ENVKEY_ENV: preprod
PING rabbitmq.rabbitmq.svc.cluster.local:15672
RabbitMQ is UP!
2024-06-19T08:30:08Z 19 INFO -- writing pid in /home/effilab/tmp/hutch.pid
2024-06-19T08:30:08Z 19 INFO -- hutch booted with pid 19
2024-06-19T08:30:08Z 19 INFO -- found rails project (.), booting app in preprod environment
Labels:map[app:google-sync-app-master pod-template-hash:65d56f4988 role:hutch]}"}
Version/Commit
All fine before 0.9
Notification not triggered with the 0.9 but could be a consequence of others bug fixed in subsequent releases like 0.9 logs give these sorts of logs :
{"level":"info","msg":"container only issue nginx tag-xy-6ff64687c7-zsmb4 tag-xy-6ff64687c7 Error 137","time":"2024-06-21T14:53:19Z"}
Describe the bug Sounds the ignore failed graceful shutdown feature not working correctly since last versions. It was fine before before 0.9
Sounds some consequences of refactoring done in https://github.com/abahmed/kwatch/pull/280 in particular https://github.com/abahmed/kwatch/blob/main/filter/containerKillingFilter.go
To Reproduce Scale down some deployment of app unable to stop in the allowed grace period.
Expected behavior No alert if pods killed after grace period of a normal cluster behavior (scaling, rearrangement,..)
Actual behavior
Version/Commit All fine before 0.9 Notification not triggered with the 0.9 but could be a consequence of others bug fixed in subsequent releases like 0.9 logs give these sorts of logs : {"level":"info","msg":"container only issue nginx tag-xy-6ff64687c7-zsmb4 tag-xy-6ff64687c7 Error 137","time":"2024-06-21T14:53:19Z"}