aws-neuron / aws-neuron-sagemaker-samples

MIT No Attribution
14 stars 5 forks source link

Updated container image due to model deployment failing. #8

Closed vjaramillo closed 2 months ago

vjaramillo commented 2 months ago

Issue #, if available:

Description of changes:

Updated ECR image. Particularly: Python version from 1.13.1 to 2.1.2 neuronx sdk from 2.13.2 to 2.18.1

With the previous image the model deployment was failing and giving an error "worker died".

Added the variable model_server_workers = 2 to avoid having more workers than the amount of neuron cores, thus causing more errors in Cloudwatch logs.

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.