aws / sagemaker-inference-toolkit

Serve machine learning models within a 🐳 Docker container using 🧠 Amazon SageMaker.
Apache License 2.0
372 stars 82 forks source link

fix: add SIGCHILD Handler for MMS #76

Closed dhanainme closed 3 years ago

dhanainme commented 3 years ago

MMS Worker processes are not cleared up upon Python SIGKILL in Docker env.

This fix adds SIGCHILD Handler for MMS to ensure the cleanup happens upon Model UnRegister.

Testing done:

Repeated Reg / UnReg on a container with this fix would not leave any Zombie processes

while true;
do
    curl -X POST "localhost:8080/models?url=https://s3.amazonaws.com/model-server/model_archive_1.0/squeezenet_v1.1.mar&initial_workers=3&synchronous=true"
    sleep 10
    curl -X DELETE "localhost:8080/models/squeezenet_v1.1"
    sleep 10
done

Merge Checklist

Put an x in the boxes that apply. You can also fill these out after creating the PR. If you're unsure about any of them, don't hesitate to ask. We're here to help! This is simply a reminder of what we are going to look for before merging your pull request.

General

Tests

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

sagemaker-bot commented 3 years ago

AWS CodeBuild CI Report

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

sagemaker-bot commented 3 years ago

AWS CodeBuild CI Report

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

sagemaker-bot commented 3 years ago

AWS CodeBuild CI Report

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

dhanainme commented 3 years ago

Hi !

The build has been failing for Docker build for dependency resolution. Can someone take a look at this

INFO: pip is looking at multiple versions of sagemaker-inference to determine which version is compatible with other requirements. This could take a while.
INFO: pip is looking at multiple versions of gluonnlp to determine which version is compatible with other requirements. This could take a while.
INFO: pip is looking at multiple versions of aws-mxnet-mkl to determine which version is compatible with other requirements. This could take a while.
ERROR: Cannot install sagemaker-inference 1.2.2.dev0 (from /sagemaker_inference.tar.gz) and sagemaker-mxnet-inference==1.3.1.dev0 because these package versions have conflicting dependencies.

The conflict is caused by:
    The user requested sagemaker-inference 1.2.2.dev0 (from /sagemaker_inference.tar.gz)
    sagemaker-mxnet-inference 1.3.1.dev0 depends on sagemaker-inference<=1.2.0 and >=1.1.0

To fix this you could try to:
1. loosen the range of package versions you've specified
2. remove package versions to allow pip attempt to solve the dependency conflict

ERROR: ResolutionImpossible: for help visit https://pip.pypa.io/en/latest/user_guide/#fixing-conflicting-dependencies
The command '/bin/sh -c ${PIP} install --no-cache-dir     ${MX_URL}     git+git://github.com/dmlc/gluon-nlp.git@v0.9.0     multi-model-server==$MMS_VERSION     keras-mxnet==2.2.4.1     numpy==1.17.4     onnx==1.4.1     /sagemaker_mxnet_inference.tar.gz     /sagemaker_inference.tar.gz  && rm /sagemaker_mxnet_inference.tar.gz /sagemaker_inference.tar.gz' returned a non-zero code: 1
sagemaker-bot commented 3 years ago

AWS CodeBuild CI Report

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

sagemaker-bot commented 3 years ago

AWS CodeBuild CI Report

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository