aws / sagemaker-python-sdk

A library for training and deploying machine learning models on Amazon SageMaker
https://sagemaker.readthedocs.io/
Apache License 2.0
2.1k stars 1.14k forks source link

Support different input and output types for Model Monitor data quality monitoring jobs #2785

Open caitriggs opened 2 years ago

caitriggs commented 2 years ago

Discussed in https://github.com/aws/sagemaker-python-sdk/discussions/2393

Currently, when capturing data for a scheduled Model Monitor job, the data inputs and outputs must be encoded using the same content type. Otherwise, the following error occurs:

Error: Encoding mismatch: Encoding is CSV for endpointInput, but Encoding is JSON for endpointOutput. We currently only support the same type of input and output encoding at the moment.

An example for what captureData is giving back for a ModelMonitor.list_executions() call where the input is CSV and the output is set to JSON: {"captureData":{"endpointInput":{"observedContentType":"text/csv","mode":"INPUT","data":"0.5629733055182888,0.3018707225866159,0.5824503894753207","encoding":"CSV"},"endpointOutput":{"observedContentType":"application/json","mode":"OUTPUT","data":"{\"predictions\": [{\"score\": 0.012620825320482254, \"predicted_label\": 0}]}","encoding":"JSON"}},"eventMetadata":{"eventId":"28cc8646-bb47-4a96-92fc-d04fc2651286","inferenceTime":"2021-11-30T04:32:05Z"},"eventVersion":"0"}

Please support different input and output content types for Model Monitor data quality monitoring jobs.

Because there's also no apparent way to set both input and output of the endpoint to the same encoding in the SageMaker examples or documentation.

Setting a serializer and deserializer to the same content type during deployment of the endpoint does not appear to work. The endpoint continues to only set endpointOutput to JSON.

# data capture config object
data_capture_config = sagemaker.model_monitor.DataCaptureConfig(
    enable_capture=True,
    sampling_percentage=100, 
    capture_options=["REQUEST", "RESPONSE"],
    csv_content_types=["text/csv"],
    destination_s3_uri=s3_capture_upload_path,
    sagemaker_session=sm_sess
)

model.deploy(
             initial_instance_count=endpoint_instance_count,
             instance_type=endpoint_instance_type,
             model_name=model_name,
             endpoint_name=endpoint_name,
             data_capture_config=data_capture_config,
             serializer=sagemaker.serializers.CSVSerializer(),
             deserializer=sagemaker.deserializers.CSVDeserializer(accept="application/json"),
             tags=[{'Key':'demo-configs', 'Value':prefix}]
)

This results in any scheduled data quality monitoring job fail with that same "Encoding mismatch" error.

saivinilpratap-ta commented 2 years ago

you got the resolution for this @caitriggs ?

ashishpal2702 commented 1 year ago

Is there any resolution or work around for this?