Currently, I am using the nvcr.io/nim/meta/llama-3.1-8b-instruct:1.1.2 image to create an InferenceService following the provided guide, and the service is running properly. Pod creation and API calls are working fine, but I am encountering an issue when trying to delete the Pod.
It seems that the Terminate command is not being sent to the Nim server when I request the deletion of the InferenceService or Pod. There are no KILL signals in the internal logs either, and the Pod is only forcefully deleted when it reaches the terminationGracePeriodSeconds: 300.
Do I need to provide any additional options when starting the Nim server, or is this a known issue?
I'm seeing this same behavior when deploying using KServe. The scale up process works fine and is fast but, when no longer needed, the removal of the pod during scale down takes five minutes.
Currently, I am using the nvcr.io/nim/meta/llama-3.1-8b-instruct:1.1.2 image to create an InferenceService following the provided guide, and the service is running properly. Pod creation and API calls are working fine, but I am encountering an issue when trying to delete the Pod.
It seems that the Terminate command is not being sent to the Nim server when I request the deletion of the InferenceService or Pod. There are no KILL signals in the internal logs either, and the Pod is only forcefully deleted when it reaches the terminationGracePeriodSeconds: 300.
Do I need to provide any additional options when starting the Nim server, or is this a known issue?