Closed sachanub closed 8 months ago
Powered by github-codebuild-logs, available on the AWS Serverless Application Repository
Powered by github-codebuild-logs, available on the AWS Serverless Application Repository
Powered by github-codebuild-logs, available on the AWS Serverless Application Repository
Powered by github-codebuild-logs, available on the AWS Serverless Application Repository
Powered by github-codebuild-logs, available on the AWS Serverless Application Repository
Issue #, if available:
Description of changes:
Due to upgrade of
psutil
version from5.9.5
to5.9.6
, customers started facing this error:From version
5.9.6
,psutil
started correctly raising ZombieProcess on Process.exe(), Process.cmdline() and Process.memory_maps() instead of returning a "null" value. As a result, customers started facing theZombieProcess
exception due to this line inmodel_server.py
: https://github.com/aws/sagemaker-inference-toolkit/blob/master/src/sagemaker_inference/model_server.py#L276The proposed fix is as follows:
While iterating through the processes via
psutil.process_iter()
, before checking for the presence ofMMS_NAMESPACE
inprocess.cmdline()
, we check the status of the process. If it has the zombie status, we skip checking the presence ofMMS_NAMESPACE
inprocess.cmdline()
for that process, avoiding theZombieProcess
exception.Testing done:
Launched a BYOC container (utilizes
sagemaker_inference
in the entrypoint to start MMS). Without the fix, the container stops after 5-10 minutes with the above mentioned exception. With the above mentioned exception, the container did not stop (I kept it running for a few hours).UPDATE: Added 60 second sleep in the local integration tests after container is started to fully allow the model server to start up. This was needed to avoid failure of local integration tests since they were run before the model server could even finish starting.
Merge Checklist
Put an
x
in the boxes that apply. You can also fill these out after creating the PR. If you're unsure about any of them, don't hesitate to ask. We're here to help! This is simply a reminder of what we are going to look for before merging your pull request.General
Tests
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.