No DataProcessing in local transform job

sephib commented 5 months ago

Describe the bug When running instance_type="local" the DataProcessing is not used, and all the input data is sent to the prediction

In the _perform_batch_inference function in the entities.py file there is no use of the DataProcessing key from the kwargs. - so the input_data / item is sent as-is without any filtering .

To reproduce

from sagemaker.model import Model
from sagemaker.local import LocalSession
import boto3        

model = Model(
        model_data='file://to/my/model_data',
        role='MY_ROLE',
        image_uri='IMAGE_URI',
        sagemaker_session= LocalSession(boto3.Session(region_name='my-region'))
    )
transformer = model.transformer(
    instance_count=1,
    instance_type="local",
    strategy="MultiRecord",
    assemble_with="Line",
    output_path="file://my/output/path",
    accept="text/csv",
    max_concurrent_transforms=1,
)
transformer.transform(
    data="file://path/to/my/data/file",
    content_type="text/csv",
    split_type="Line",
    input_filter="$[4]",  # this currently seams not to be working in local mode
    join_source="Input",
    output_filter="$[0]",
)
transformer.wait()

Expected behavior The input csv should be filtered using the input_filter value. Also the the output

System information A description of your system. Please provide:

SageMaker Python SDK version:

sagemaker==2.219.0
Framework name (eg. PyTorch) or algorithm (eg. KMeans):

tested with pyTorch model
Python version: 3.10.14
Custom Docker image (Y/N):

i'm using a custom image

mufaddal-rohawala commented 4 months ago

Thanks for reaching out to SageMaker! We are tracking this request internally, and will update on this soon!

lorenzwalthert commented 1 month ago

This is a duplicate of #4095. Maybe upvote the existing issue?

aws / sagemaker-python-sdk

No DataProcessing in local transform job #4757