aws / sagemaker-python-sdk

A library for training and deploying machine learning models on Amazon SageMaker
https://sagemaker.readthedocs.io/
Apache License 2.0
2.11k stars 1.14k forks source link

Processing job input and output cannot be the same #1950

Closed RoelantStegmann closed 3 years ago

RoelantStegmann commented 4 years ago

Describe the bug Sagemaker requires the input and output to be different folders (for no apparent reason)

To reproduce

proc = Processor(
   ....
)
proc.run(
    inputs=[
         ProcessingInput(
            source='s3://processing-data/test/',
            destination='/opt/ml/test/'
        ),
    ],
    outputs=[
        ProcessingOutput(
            source=f'/opt/ml/processing/test/',
            destination=f's3://processing-data/test',
            s3_upload_mode="Continuous"
        ),
    ],
    arguments=[
    ],
)

Expected behavior

You would expect this to be allowed, but it runs into validation errors that apparently compares the input and output paths.

chuyang-deng commented 4 years ago

Hi @RoelantStegmann, are you using your own processing container? SageMaker will go to specific locations in the container for input data and output data, here's how SageMaker handle input/output. https://docs.aws.amazon.com/sagemaker/latest/dg/build-your-own-processing-container.html#byoc-input-and-output

Also could you provide stacktrace of the validation error so that we could see if this validation is necessary and can be re-evaluated at Python SDK level?