aws / sagemaker-python-sdk

A library for training and deploying machine learning models on Amazon SageMaker
https://sagemaker.readthedocs.io/
Apache License 2.0
2.11k stars 1.14k forks source link

No DataProcessing in local transform job #4757

Open sephib opened 5 months ago

sephib commented 5 months ago

Describe the bug When running instance_type="local" the DataProcessing is not used, and all the input data is sent to the prediction

In the _perform_batch_inference function in the entities.py file there is no use of the DataProcessing key from the kwargs. - so the input_data / item is sent as-is without any filtering .

To reproduce

from sagemaker.model import Model
from sagemaker.local import LocalSession
import boto3        

model = Model(
        model_data='file://to/my/model_data',
        role='MY_ROLE',
        image_uri='IMAGE_URI',
        sagemaker_session= LocalSession(boto3.Session(region_name='my-region'))
    )
transformer = model.transformer(
    instance_count=1,
    instance_type="local",
    strategy="MultiRecord",
    assemble_with="Line",
    output_path="file://my/output/path",
    accept="text/csv",
    max_concurrent_transforms=1,
)
transformer.transform(
    data="file://path/to/my/data/file",
    content_type="text/csv",
    split_type="Line",
    input_filter="$[4]",  # this currently seams not to be working in local mode
    join_source="Input",
    output_filter="$[0]",
)
transformer.wait()

Expected behavior The input csv should be filtered using the input_filter value. Also the the output

System information A description of your system. Please provide:

mufaddal-rohawala commented 4 months ago

Thanks for reaching out to SageMaker! We are tracking this request internally, and will update on this soon!

lorenzwalthert commented 1 month ago

This is a duplicate of #4095. Maybe upvote the existing issue?