aws / amazon-sagemaker-examples

Example 📓 Jupyter notebooks that demonstrate how to build, train, and deploy machine learning models using 🧠 Amazon SageMaker.
https://sagemaker-examples.readthedocs.io
Apache License 2.0
10.1k stars 6.76k forks source link

[Bug Report] Error while creating ProcessingStep #3391

Closed usik closed 2 years ago

usik commented 2 years ago

Link to the notebook https://github.com/aws/amazon-sagemaker-examples/blob/main/end_to_end/nlp_mlops_company_sentiment/02_nlp_company_earnings_analysis_pipeline.ipynb

Describe the bug While creating a ProcessingStep such as below: create_dataset_step = ProcessingStep( name='HFSECFinBertCreateDataset', processor=create_dataset_processor, outputs=[sagemaker.processing.ProcessingOutput(output_name='report_data', source='/opt/ml/processing/output/10k10q', destination=f'{inference_input_data}/10k10q'), sagemaker.processing.ProcessingOutput(output_name='article_data', source='/opt/ml/processing/output/articles', destination=f'{inference_input_data}/articles')],

ProcessingOutput does not take pipeline parameter as a destination or source. It throws "TypeError: Pipeline variables do not support str operation."

To reproduce A clear, step-by-step set of instructions to reproduce the bug. SageMaker version: 2.88.3 Boto3 version: 1.22.10

  1. Following the Notebook step by step until Step 1 where we create a processing step.
  2. Also in Step 4, under the section "Create pipeline conditions to check if the Endpoint deployments were successful", there is also another processing step.

Logs If applicable, add logs to help explain your problem. You may also attach an .ipynb file to this issue if it includes relevant logs or output.


TypeError Traceback (most recent call last)

in 7 outputs=[sagemaker.processing.ProcessingOutput(output_name='report_data', 8 source='/opt/ml/processing/output/10k10q', ----> 9 destination=f'{inference_input_data}/10k10q'), 10 sagemaker.processing.ProcessingOutput(output_name='article_data', 11 source='/opt/ml/processing/output/articles', /opt/conda/lib/python3.7/site-packages/sagemaker/workflow/entities.py in __str__(self) 79 def __str__(self): 80 """Override built-in String function for PipelineVariable""" ---> 81 raise TypeError("Pipeline variables do not support __str__ operation.") 82 83 def __int__(self): TypeError: Pipeline variables do not support __str__ operation. Thank you in advance. References: [1] https://sagemaker.readthedocs.io/en/stable/api/training/processing.html#sagemaker.processing.ProcessingOutput
geremyCohen commented 2 years ago

@usik

Changing

destination=f'{inference_input_data}/10k10q') destination=f'{inference_input_data}/articles')

to

destination=f'{inference_input_data.default_value}/10k10q') destination=f'{inference_input_data.default_value}/articles')

(and all subsequent instances of using this object for string interpolation) appears to fix the issue.

usik commented 2 years ago

Thank you!