UTF8 Decoding depends on customer content type

aws / sagemaker-inference-toolkit

Serve machine learning models within a 🐳 Docker container using 🧠 Amazon SageMaker.

Apache License 2.0

379 stars 82 forks source link

UTF8 Decoding depends on customer content type #10

Open ericangelokim opened 4 years ago

ericangelokim commented 4 years ago

https://github.com/aws/sagemaker-inference-toolkit/blob/24db871b1b193ac1a924c21be8c3ec48853b3263/src/sagemaker_inference/transformer.py#L68

This line is not defensive coding; if customer puts incorrect content_type this can be problematic, and there's currently no way of returning a smart response back to customer.

ChoiByungWook commented 4 years ago

@ericangelokim,

What do you recommend?

I think when this was written it was to mimc this line in the SageMaker containers SDK functionality: https://github.com/aws/sagemaker-containers/blob/master/src/sagemaker_containers/_worker.py#L196

Can you reference an issue as an example?

Feel free to ping me if it is private.

Thanks!

ericangelokim commented 4 years ago

I think you need to relinquish this validation to the customer. For example (personal case), this line fails if another encoding is sent but the customer sets content_type to be CSV. There isn't a good way to respond to the client with the root cause of the error.

I am working on another PR that adds error handling back to client, so we can work on that first and see if this situation changes.