aws / sagemaker-pytorch-inference-toolkit

Toolkit for allowing inference and serving with PyTorch on SageMaker. Dockerfiles used for building SageMaker Pytorch Containers are at https://github.com/aws/deep-learning-containers.
Apache License 2.0
131 stars 70 forks source link

No model logs from PyTorch 1.10 SageMaker endpoint #116

Closed setu4993 closed 2 years ago

setu4993 commented 2 years ago

Describe the bug

No model logs show up in endpoints created for PyTorch 1.10. Works fine by going to PyTorch 1.9 / 1.9.1, but not with 1.10.

Dependencies install correctly, and the model loads up, but there's no logs from the container that get forwarded to CloudWatch.

I tried updating to 2.8.0 within the container but that doesn't work because the properties file is different and it fails trying to find log4j.properties.

Expected behavior

Logs should be forwarded to CloudWatch.

Screenshots or logs

Screen Shot 2022-02-25 at 13 37 19

System information A description of your system. Please provide:

SamShowalter commented 2 years ago

Any update on this? I am also having these issues with both torch versions 1.9 and 1.10 and my code is not compatible with the older images.

setu4993 commented 2 years ago

@SamShowalter : I didn't get any traction here, but after raising it AWS Support, they did tell me a couple weeks ago (and I have since verified) that logs work fine for PyTorch 1.10.2 and PyTorch 1.11 images. They were released / updated in the last couple months according to this doc.

We also saw the same issue of missing logs on 1.9 images, but have transitioned to 1.10 / 1.11, so haven't followed up.