Describe the feature you'd like
Support for the MIME type parquet files in the sagemaker toolkit. E.g. in the README of this repo, there is an example default_input_fn():
def default_input_fn(self, input_data, content_type, context=None):
"""A default input_fn that can handle JSON, CSV and NPZ formats.
Args:
input_data: the request payload serialized in the content_type format
content_type: the request content_type
context (obj): the request context (default: None).
Returns: input_data deserialized into torch.FloatTensor or torch.cuda.FloatTensor depending if cuda is available.
"""
return decoder.decode(input_data, content_type)
Looking into decoder.decode, I see the following MIME types are supported:
Should not be too hard to add parquet here. Parquet is a dat file commonly used with large datasets and also supported in other sagemaker services, for example in Autopilot.
How would this feature be used? Please describe.
Reduce storage costs, data I/O costs, increase speed while processing.
Describe alternatives you've considered
CSV is the standard, but it's a much less efficient way to store, read and write column-oriented data.
Describe the feature you'd like Support for the MIME type parquet files in the sagemaker toolkit. E.g. in the README of this repo, there is an example
default_input_fn()
:Looking into
decoder.decode
, I see the following MIME types are supported:Should not be too hard to add
parquet
here. Parquet is a dat file commonly used with large datasets and also supported in other sagemaker services, for example in Autopilot.How would this feature be used? Please describe. Reduce storage costs, data I/O costs, increase speed while processing.
Describe alternatives you've considered
CSV is the standard, but it's a much less efficient way to store, read and write column-oriented data.
Additional context