Open ldong87 opened 4 years ago
Anyone has been able to solve it? having the same problem
Having the same issue when returning a single 1024x1024 image.
Are you using PyTorch? In my case, it was about netty max response size in AWS pytorch image and you can increase it. AWS pytorch image uses TorchServe and TorchServe uses netty internally. You can see "netty" in your error message. TorchServe can be configured by config.properties, which already exists in AWS pytorch image.
In this line, /home/model-server/config.properties is provided as TorchServe config. So all you have to do is add custom configs into the file. You can do this by extending the container with following steps.
enable_envvars_config=true
into /home/model-server/config.properties
TS_MAX_RESPONSE_SIZE
to large value.If you set enable_envvars_config=true
, you can set all properties with environment variable TS_<PROPERTY_NAME>
. Environment variables can be set using SageMaker SDK, so it will be better to use env than set properties directly into the config file. max_response_size
is The maximum allowable response size that the Torchserve sends, in bytes
according to PyTorch Serve documentation.
FROM <AWS_PYTORCH_IMAGE_URI>
RUN echo "enable_envvars_config=true" >> /home/model-server/config.properties
# Set TS_MAX_RESPONSE=655350000 or large value you need into env vars using SageMaker SDK
I'm having the same problem. I don't understand how can I set enable_envvars_config=true
from the notebook (if there is a way). Do you have some ideas?
Thanks
@stevinc I think it is good to build a new image. Write a Dockerfile same with the example I wrote in past comment, build it, and upload to ECR. You can use the image with SageMaker SDK using image_uri
parameter in sagemaker.estimator.Estimator
or sagemaker.estimator.Framework
.
@grraffe Thanks. I build a new image with a custom Dockerfile where I load my modified config.properties
and now it works.
It seems like enable_envvar_config is already preset to true by AWS. I could pass the TS_MAX_RESPONSE_SIZE as an environment variable, as shown in the code below, and managed to resolve the above error.
transformer = model.transformer(instance_count=1, instance_type="ml.p3.2xlarge", assemble_with='Line', accept='application/jsonlines', strategy='MultiRecord', max_payload=6, env={'SAGEMAKER_MODEL_SERVER_TIMEOUT':'3600', 'TS_MAX_RESPONSE_SIZE':'20000000'} )
Depending on the serving tool you use, i.e. TorchServe (TS) / Multi-model Server, you can change the maximum request/response size by setting the TS_MAX_RESPONSE_SIZE / MMS_MAX_RESPONSE_SIZE / ... environment variables when creating / deploying your model in SageMaker. See further details in my response here for MMS.
Had the same issue for realtime endpoints, passing in TS_MAX_RESPONSE_SIZE in the env variable solved this as well.
@laphang @vprecup I am having this issue as well, and I am wondering how you are passing these environment variables to your PyTorch sagemaker model?
estimator = PyTorch(
entry_point="train.py",
source_dir="sagemaker_container_files",
role=role,
py_version="py39",
framework_version="1.13",
instance_count=1,
instance_type="ml.c5.2xlarge",
hyperparameters=hyperparameters,
env={'SAGEMAKER_MODEL_SERVER_TIMEOUT':'3600',
'TS_MAX_RESPONSE_SIZE':'2000000000',
'TS_MAX_REQUEST_SIZE':'2000000000',
'MMS_MAX_RESPONSE_SIZE':'2000000000',
'MMS_MAX_REQUEST_SIZE':'2000000000',
}
)
is this a valid way to pass in these env variables? or do I do it when I deploy the model? with:
estimator.deploy(.....)
or at some other point?
@levatas-weston yes to the PyTorchModel, see the base class docs below. https://sagemaker.readthedocs.io/en/stable/api/inference/model.html#sagemaker.model.Model
If anyone else in the future is wondering, the following is what solved my issue:
estimator.deploy(........,env={
'SAGEMAKER_MODEL_SERVER_TIMEOUT':'3600',
'TS_MAX_RESPONSE_SIZE':'2000000000',
'TS_MAX_REQUEST_SIZE':'2000000000',
'MMS_MAX_RESPONSE_SIZE':'2000000000',
'MMS_MAX_REQUEST_SIZE':'2000000000',}
})
My understanding is the container that SageMaker creates for training and the container it makes for deployment are completely separate, and therefore you only need these env variables in the deployment container (that has torchserve running).
What are arguments like "max_payload" for in transformer = pytorch_model.transformer(instance_count=4, instance_type="ml.g5.xlarge", strategy="MultiRecord", max_payload=100)
if we just need to configure torchserve environment variables directly? I can't get Sagemaker to activate these max_payload configs supplied with sagemaker APIs, quite confusing since my payload size of a single image is 9Mb and I increased the size. I still get 413 errors.
{'events': [{'timestamp': 1729118963985,
'message': '2024-10-16T22:49:21.569:[sagemaker logs]: MaxConcurrentTransforms=1, MaxPayloadInMB=100, BatchStrategy=MULTI_RECORD',
'ingestionTime': 1729118970158},
{'timestamp': 1729118963985,
'message': '2024-10-16T22:49:21.763:[sagemaker logs]: sagemaker-us-west-2-058264276765/benchmarks/object_detection_5000/part-00000-4316c05b-d740-4823-a8da-80afbd86e5ac-c000/1145684724-8-7.tiff: ClientError: 413',
Describe the feature you'd like I'm running a NLP inference job to get sentence embedding vector for each record. Each record is less than 512 words only and the returned vector has 768 floats. Even if I set max_payload to 1 and max_concurrent_transforms to 1, I still got: io.netty.handler.codec.CorruptedFrameException: Message size exceed limit: 113038147
I know there is a 5MB limit for endpoint inference. I'm relatively sure that this is caused by the size limit of return results of each request from batch transform, though I'm not able to find it in the doc.
I hope you can
How would this feature be used? Please describe.
Add a max_return_payload parameter to model.transformer like below.
tfm = model.transformer(instance_count=1, instance_type='ml.p3.2xlarge', accept='text/json', assemble_with='Line', output_path=batch_out, strategy='MultiRecord', max_payload=3, max_return_payload=30)
Describe alternatives you've considered A clear and concise description of any alternative solutions or features you've considered.
Use strategy='SingleRecord' can bypass the issue. But it is significant slower as it's not making use of parallelism. Setting max_concurrent_transform to a larger value can make use of parallelism. But it could cause problem for code that's not designed for concurrency.
Additional context Add any other context or screenshots about the feature request here.
A similar issue is raised here by other people: https://github.com/aws/sagemaker-python-sdk/issues/1096