Open zoran-hristov opened 3 years ago
@zoran-hristov Did you find any resolution to the same issue? I am also facing the same problem. Even after setting TS_DEFAULT_WORKERS_PER_MODEL=2 in config.properties it is not getting reflected in cloudwatchlogs. In cloudwatch logs, it is clearly showing Number of CPUs: 1. I used the same example as in the repo.
Yes, I found the solution. One part is noted in subsequent Deep Learning containers release notes, but there is no fix in the images(see Known issues). it is related with with OMP_NUM_THREADS parameter. I suggest to assign value to it numberOfCPUs/2 or less. It is regulate environment variables like OMP_NUM_THREADS.
The other part is to make the enable the container support for cpu detection, especially for the JVM. So, we re-build the image with fix to override.
We are setting this in the code, as the config.properties is not used in the image. I have no explanation why they abandoned the use of config.properties
Here is a way to do it, with overwriting in Dockerfile:
FROM 763104351884.dkr.ecr.eu-west-1.amazonaws.com/pytorch-inference:1.7.1-cpu-py36-ubuntu18.04
# In case standard path is not used, patch with next lines
RUN echo "vmargs=-XX:-UseContainerSupport" >> /opt/conda/lib/python3.6/site-packages/sagemaker_inference/etc/default-mms.properties
RUN echo "vmargs=-XX:-UseContainerSupport" >> /opt/conda/lib/python3.6/site-packages/sagemaker_pytorch_serving_container/etc/default-ts.properties
RUN echo "vmargs=-XX:-UseContainerSupport" >> /opt/conda/lib/python3.6/site-packages/sagemaker_pytorch_serving_container/etc/mme-ts.properties
Thanks @zoran-hristov it helped me to resolve the issue.
Describe the bug This issue is related to the issue JVM bug 82 in sagemaker-inference-toolkit
To reproduce
Clone the SaeMaker example Deploy the model using the same endpoint. Check CloudWatch logs and the number of CPU cores detected will be like Number of CPUs: 1 JVM detect the CPU count as 1 when more CPUs are available for the container.
Expected behavior The CPU count from CloudWatch should match the CPU count for the used instance. For example, 4 if the instance is ml.m4.xlarge
System information Container: pytorch-inference:1.7-cpu-py3 and pytorch-inference:1.7-gpu-py3 SageMaker inference v1.1.2
Additional context This clearly does not allow the usage of all CPUs on the instance for Sagemaker Inference