aws / aws-step-functions-data-science-sdk-python

Step Functions Data Science SDK for building machine learning (ML) workflows and pipelines on AWS
Apache License 2.0
285 stars 87 forks source link

render_graph() has hidden dependencies #189

Open cshumaker-irb opened 2 years ago

cshumaker-irb commented 2 years ago

render_graph() is improperly documented and is not portable

Use Case

Better support for render_graph() allows not just exploration through notebooks, it enables real production software projects. It provides feedback that doesn't require deployment in the AWS Console or through an automated deployment pipeline. It helps with documentation and code reviews.

It's a highly valuable tool in the SDK that only works in a notebook.

Proposed Solution

Update the docs:

The second link says it doesn't work in JupyterLab but it doesn't explicitly say that it only works in Jupyter Notebooks. I believe SageMaker and EMR Notebooks both use JupyterLab at this point.

If nothing is going to be done, this implementation should at least be deprecated because it's so selectively useful and is a red herring for people looking for this functionality.

Other

The current render_graph() method provides very unclear documentation. There is no mention of the Ipython requirement and the logs warning the user can easily be silenced without them ever knowing.

If you try to dig into the render_graph() method, you'll find the WorkflowGraphWidget and the show() method that generates some HTML object for IPython. In order to get something potentially useful, I checked out the "data" attribute of the HTML that was generated and found a CSS link and a JS script tag and a div with an svg tag. It appears that the widget generates the html containing javascript to generate the definition.

This could be much better. For one, the docs need to be updated to reflect the dependencies.

For actual rendering, you could:


This is a :rocket: Feature Request