aws / sagemaker-python-sdk

A library for training and deploying machine learning models on Amazon SageMaker
https://sagemaker.readthedocs.io/
Apache License 2.0
2.09k stars 1.13k forks source link

Pipelines: Custom Dependency Between Steps does not work for CallBackStep and LambdaStep #2556

Closed johanneslanger closed 3 years ago

johanneslanger commented 3 years ago

Describe the bug In SageMakerPipelines when using stepA.add_depends_on([stepB]) and stepB is either a CallBack or LambdaStep, the Pipeline fails to serialize/upsert.

To reproduce Define a pipeline with a step that depends on either a CallBackStep or LambdaStep. Then call pipeline.definition() or pipeline.upsert()

Expected behavior Custom Dependency should work for lambda and callback steps

Screenshots or logs If applicable, add screenshots or logs to help explain your problem.

Traceback (most recent call last):
  File "/Users/jlanger/dev/mlops-blog/ml-ops-to-greengrass/cdk/labeling/pipeline/run_pipeline.py", line 177, in <module>
    main()
  File "/Users/jlanger/dev/mlops-blog/ml-ops-to-greengrass/cdk/labeling/pipeline/run_pipeline.py", line 99, in main
    run_pipeline(pipeline=pipeline, role=args.role)
  File "/Users/jlanger/dev/mlops-blog/ml-ops-to-greengrass/cdk/labeling/pipeline/run_pipeline.py", line 158, in run_pipeline
    parsed = json.loads(pipeline.definition())
  File "/Users/jlanger/.pyenv/versions/mlops-blog/lib/python3.9/site-packages/sagemaker/workflow/pipeline.py", line 269, in definition
    return json.dumps(request_dict)
  File "/Users/jlanger/.pyenv/versions/3.9.1/lib/python3.9/json/__init__.py", line 231, in dumps
    return _default_encoder.encode(obj)
  File "/Users/jlanger/.pyenv/versions/3.9.1/lib/python3.9/json/encoder.py", line 199, in encode
    chunks = self.iterencode(o, _one_shot=True)
  File "/Users/jlanger/.pyenv/versions/3.9.1/lib/python3.9/json/encoder.py", line 257, in iterencode
    return _iterencode(o, 0)
  File "/Users/jlanger/.pyenv/versions/3.9.1/lib/python3.9/json/encoder.py", line 179, in default
    raise TypeError(f'Object of type {o.__class__.__name__} '
TypeError: Object of type CallbackStep is not JSON serializable

System information A description of your system. Please provide:

Additional context Add any other context about the problem here.

johanneslanger commented 3 years ago

I noticed it works when setting:

stepA.add_depends_on([stepB.name])

instead of

stepA.add_depends_on([stepB])

Is this intended and just a documentation mistake?

jerrypeng7773 commented 3 years ago

Hi there, the latest release will support passing either step object or step name string to the depends on list.