need to be able to set return data size for each request in batch transform

ldong87 commented 4 years ago

Describe the feature you'd like I'm running a NLP inference job to get sentence embedding vector for each record. Each record is less than 512 words only and the returned vector has 768 floats. Even if I set max_payload to 1 and max_concurrent_transforms to 1, I still got: io.netty.handler.codec.CorruptedFrameException: Message size exceed limit: 113038147

I know there is a 5MB limit for endpoint inference. I'm relatively sure that this is caused by the size limit of return results of each request from batch transform, though I'm not able to find it in the doc.

I hope you can

Clarify the size limit for each request for batch transform in the doc.
Mimicking max_payload, expose a parameter to let user control the return data size for each request for batch transform.

How would this feature be used? Please describe.

Add a max_return_payload parameter to model.transformer like below.

tfm = model.transformer(instance_count=1, instance_type='ml.p3.2xlarge', accept='text/json', assemble_with='Line', output_path=batch_out, strategy='MultiRecord', max_payload=3, max_return_payload=30)

Describe alternatives you've considered A clear and concise description of any alternative solutions or features you've considered.

Use strategy='SingleRecord' can bypass the issue. But it is significant slower as it's not making use of parallelism. Setting max_concurrent_transform to a larger value can make use of parallelism. But it could cause problem for code that's not designed for concurrency.

Additional context Add any other context or screenshots about the feature request here.

A similar issue is raised here by other people: https://github.com/aws/sagemaker-python-sdk/issues/1096

siovaneDAZN commented 3 years ago

Anyone has been able to solve it? having the same problem

NicolaiDige commented 3 years ago

Having the same issue when returning a single 1024x1024 image.

grraffe commented 2 years ago

Are you using PyTorch? In my case, it was about netty max response size in AWS pytorch image and you can increase it. AWS pytorch image uses TorchServe and TorchServe uses netty internally. You can see "netty" in your error message. TorchServe can be configured by config.properties, which already exists in AWS pytorch image.

https://github.com/aws/deep-learning-containers/blob/ede068b4363ba22fd785c426a7f3589bada76d4f/pytorch/inference/docker/1.9/py3/cu111/Dockerfile.gpu#L208

In this line, /home/model-server/config.properties is provided as TorchServe config. So all you have to do is add custom configs into the file. You can do this by extending the container with following steps.

Add enable_envvars_config=true into /home/model-server/config.properties
set environment variable TS_MAX_RESPONSE_SIZE to large value.

If you set enable_envvars_config=true , you can set all properties with environment variable TS_<PROPERTY_NAME>. Environment variables can be set using SageMaker SDK, so it will be better to use env than set properties directly into the config file. max_response_size is The maximum allowable response size that the Torchserve sends, in bytes according to PyTorch Serve documentation.

Example Dockerfile

FROM <AWS_PYTORCH_IMAGE_URI>

RUN echo "enable_envvars_config=true" >> /home/model-server/config.properties
# Set TS_MAX_RESPONSE=655350000 or large value you need into env vars using SageMaker SDK

Reference

https://pytorch.org/serve/configuration.html

stevinc commented 2 years ago

I'm having the same problem. I don't understand how can I set enable_envvars_config=true from the notebook (if there is a way). Do you have some ideas? Thanks

grraffe commented 2 years ago

@stevinc I think it is good to build a new image. Write a Dockerfile same with the example I wrote in past comment, build it, and upload to ECR. You can use the image with SageMaker SDK using image_uri parameter in sagemaker.estimator.Estimator or sagemaker.estimator.Framework.

stevinc commented 2 years ago

@grraffe Thanks. I build a new image with a custom Dockerfile where I load my modified config.properties and now it works.

jinyantan commented 2 years ago

It seems like enable_envvar_config is already preset to true by AWS. I could pass the TS_MAX_RESPONSE_SIZE as an environment variable, as shown in the code below, and managed to resolve the above error. transformer = model.transformer(instance_count=1, instance_type="ml.p3.2xlarge", assemble_with='Line', accept='application/jsonlines', strategy='MultiRecord', max_payload=6, env={'SAGEMAKER_MODEL_SERVER_TIMEOUT':'3600', 'TS_MAX_RESPONSE_SIZE':'20000000'} )

vprecup commented 2 years ago

Depending on the serving tool you use, i.e. TorchServe (TS) / Multi-model Server, you can change the maximum request/response size by setting the TS_MAX_RESPONSE_SIZE / MMS_MAX_RESPONSE_SIZE / ... environment variables when creating / deploying your model in SageMaker. See further details in my response here for MMS.

laphang commented 2 years ago

Had the same issue for realtime endpoints, passing in TS_MAX_RESPONSE_SIZE in the env variable solved this as well.

levatas-weston commented 1 year ago

@laphang @vprecup I am having this issue as well, and I am wondering how you are passing these environment variables to your PyTorch sagemaker model?

estimator = PyTorch(
    entry_point="train.py",
    source_dir="sagemaker_container_files",
    role=role,
    py_version="py39",
    framework_version="1.13",
    instance_count=1,
    instance_type="ml.c5.2xlarge",
    hyperparameters=hyperparameters,
    env={'SAGEMAKER_MODEL_SERVER_TIMEOUT':'3600', 
         'TS_MAX_RESPONSE_SIZE':'2000000000',
         'TS_MAX_REQUEST_SIZE':'2000000000',
         'MMS_MAX_RESPONSE_SIZE':'2000000000',
         'MMS_MAX_REQUEST_SIZE':'2000000000',
    }
)

is this a valid way to pass in these env variables? or do I do it when I deploy the model? with:

estimator.deploy(.....)

or at some other point?

laphang commented 1 year ago

@levatas-weston yes to the PyTorchModel, see the base class docs below. https://sagemaker.readthedocs.io/en/stable/api/inference/model.html#sagemaker.model.Model

levatas-weston commented 1 year ago

If anyone else in the future is wondering, the following is what solved my issue:

estimator.deploy(........,env={
    'SAGEMAKER_MODEL_SERVER_TIMEOUT':'3600', 
    'TS_MAX_RESPONSE_SIZE':'2000000000',
    'TS_MAX_REQUEST_SIZE':'2000000000',
    'MMS_MAX_RESPONSE_SIZE':'2000000000',
    'MMS_MAX_REQUEST_SIZE':'2000000000',}
})

My understanding is the container that SageMaker creates for training and the container it makes for deployment are completely separate, and therefore you only need these env variables in the deployment container (that has torchserve running).

rbavery commented 1 month ago

What are arguments like "max_payload" for in transformer = pytorch_model.transformer(instance_count=4, instance_type="ml.g5.xlarge", strategy="MultiRecord", max_payload=100) if we just need to configure torchserve environment variables directly? I can't get Sagemaker to activate these max_payload configs supplied with sagemaker APIs, quite confusing since my payload size of a single image is 9Mb and I increased the size. I still get 413 errors.

{'events': [{'timestamp': 1729118963985,
   'message': '2024-10-16T22:49:21.569:[sagemaker logs]: MaxConcurrentTransforms=1, MaxPayloadInMB=100, BatchStrategy=MULTI_RECORD',
   'ingestionTime': 1729118970158},
  {'timestamp': 1729118963985,
   'message': '2024-10-16T22:49:21.763:[sagemaker logs]: sagemaker-us-west-2-058264276765/benchmarks/object_detection_5000/part-00000-4316c05b-d740-4823-a8da-80afbd86e5ac-c000/1145684724-8-7.tiff: ClientError: 413',

aws / sagemaker-python-sdk

need to be able to set return data size for each request in batch transform #1882

Example Dockerfile

Reference