aws / sagemaker-python-sdk

A library for training and deploying machine learning models on Amazon SageMaker
https://sagemaker.readthedocs.io/
Apache License 2.0
2.07k stars 1.12k forks source link

Exception with ParameterString in PySparkProcessor.run() Method #3425

Open dipanjank opened 1 year ago

dipanjank commented 1 year ago

Describe the bug If I use a ParameterString or any other PipelineVariable object in the list passed to the arguments argument in PySparkProcessor.run method, I get a TypeError (TypeError: Object of type ParameterString is not JSON serializable).

According to the documentation, arguments can be a list of PipelineVariables, so expecting this to work. Is this not supported?

To reproduce A clear, step-by-step set of instructions to reproduce the bug.


    spark_processor = PySparkProcessor(
        base_job_name="sagemaker-spark",
        framework_version="3.1",
        role=role,
        instance_count=2,
        instance_type="ml.m5.xlarge",
        sagemaker_session=sagemaker_session,
        max_runtime_in_seconds=1200,
    )

    spark_processor.run(
        submit_app="spark_processing/preprocess.py",
        arguments=[
            "--s3_input_bucket",
            ParameterString(name="s3-input-bucket", default_value=bucket),
            "--s3_input_key_prefix",
            input_prefix_abalone,
            "--s3_output_bucket",
            bucket,
            "--s3_output_key_prefix",
            input_preprocessed_prefix_abalone,
        ],
    )

Expected behavior A clear and concise description of what you expected to happen.

Expect a SageMaker ProcessingJob to be created.

Screenshots or logs If applicable, add screenshots or logs to help explain your problem.

Traceback (most recent call last):
  File "/Users/dipanjan.kailthya@TMNL.nl/PycharmProjects/sagemaker-sdk-test/run_pyspark_processor.py", line 63, in <module>
    run_sagemaker_spark_job(
  File "/Users/dipanjan.kailthya@TMNL.nl/PycharmProjects/sagemaker-sdk-test/run_pyspark_processor.py", line 37, in run_sagemaker_spark_job
    spark_processor.run(
  File "/Users/dipanjan.kailthya@TMNL.nl/PycharmProjects/sagemaker-sdk-test/venv/lib/python3.9/site-packages/sagemaker/spark/processing.py", line 902, in run
    return super().run(
  File "/Users/dipanjan.kailthya@TMNL.nl/PycharmProjects/sagemaker-sdk-test/venv/lib/python3.9/site-packages/sagemaker/spark/processing.py", line 265, in run
    return super().run(
  File "/Users/dipanjan.kailthya@TMNL.nl/PycharmProjects/sagemaker-sdk-test/venv/lib/python3.9/site-packages/sagemaker/workflow/pipeline_context.py", line 248, in wrapper
    return run_func(*args, **kwargs)
  File "/Users/dipanjan.kailthya@TMNL.nl/PycharmProjects/sagemaker-sdk-test/venv/lib/python3.9/site-packages/sagemaker/processing.py", line 572, in run
    self.latest_job = ProcessingJob.start_new(
  File "/Users/dipanjan.kailthya@TMNL.nl/PycharmProjects/sagemaker-sdk-test/venv/lib/python3.9/site-packages/sagemaker/processing.py", line 796, in start_new
    processor.sagemaker_session.process(**process_args)
  File "/Users/dipanjan.kailthya@TMNL.nl/PycharmProjects/sagemaker-sdk-test/venv/lib/python3.9/site-packages/sagemaker/session.py", line 956, in process
    self._intercept_create_request(process_request, submit, self.process.__name__)
  File "/Users/dipanjan.kailthya@TMNL.nl/PycharmProjects/sagemaker-sdk-test/venv/lib/python3.9/site-packages/sagemaker/session.py", line 4317, in _intercept_create_request
    return create(request)
  File "/Users/dipanjan.kailthya@TMNL.nl/PycharmProjects/sagemaker-sdk-test/venv/lib/python3.9/site-packages/sagemaker/session.py", line 953, in submit
    LOGGER.debug("process request: %s", json.dumps(request, indent=4))
  File "/Users/dipanjan.kailthya@TMNL.nl/opt/anaconda3/lib/python3.9/json/__init__.py", line 234, in dumps
    return cls(
  File "/Users/dipanjan.kailthya@TMNL.nl/opt/anaconda3/lib/python3.9/json/encoder.py", line 201, in encode
    chunks = list(chunks)
  File "/Users/dipanjan.kailthya@TMNL.nl/opt/anaconda3/lib/python3.9/json/encoder.py", line 431, in _iterencode
    yield from _iterencode_dict(o, _current_indent_level)
  File "/Users/dipanjan.kailthya@TMNL.nl/opt/anaconda3/lib/python3.9/json/encoder.py", line 405, in _iterencode_dict
    yield from chunks
  File "/Users/dipanjan.kailthya@TMNL.nl/opt/anaconda3/lib/python3.9/json/encoder.py", line 405, in _iterencode_dict
    yield from chunks
  File "/Users/dipanjan.kailthya@TMNL.nl/opt/anaconda3/lib/python3.9/json/encoder.py", line 325, in _iterencode_list
    yield from chunks
  File "/Users/dipanjan.kailthya@TMNL.nl/opt/anaconda3/lib/python3.9/json/encoder.py", line 438, in _iterencode
    o = _default(o)
  File "/Users/dipanjan.kailthya@TMNL.nl/opt/anaconda3/lib/python3.9/json/encoder.py", line 179, in default
    raise TypeError(f'Object of type {o.__class__.__name__} '
TypeError: Object of type ParameterString is not JSON serializable

System information A description of your system. Please provide:

Additional context Add any other context about the problem here.

OwenAshton commented 5 months ago

Any update on this issue? Getting the same problem when using any ScriptProcessor.

Only work around is to go back to a loaded ProcessingStep() which has now been marked as deprecated.

DavidRooney commented 5 months ago

Hi @martinRenou, This is causing some pretty big issues for us at the moment. Do you have any helpful updates on this please?

martinRenou commented 5 months ago

I'm not working with the Sagemaker team at the moment, you may have better luck pinging people who work on this code-base these days.

DavidRooney commented 5 months ago

Thanks for getting back. I tagged you as it says you are assigned to it? Can you assign to someone on the team? There's 425 contributors so any help knowing who to link to this would be greatly appreciated. The best I can think of is to ping people who have done recent commits 🤷

martinRenou commented 5 months ago

Friendly ping @knikure

DavidRooney commented 4 months ago

Any response at all? We would really like to continue using sagemaker but working around this issue is taking it's tole. @knikure

DavidRooney commented 1 month ago

@martinRenou Is there anyone else to friendly ping on this? knikure unassigned 👎