DAGWorks-Inc / hamilton

Hamilton helps data scientists and engineers define testable, modular, self-documenting dataflows, that encode lineage/tracing and metadata. Runs and scales everywhere python does.
https://hamilton.dagworks.io/en/latest/
BSD 3-Clause Clear License
1.88k stars 126 forks source link

fix: escape HTML characters in config visualization #1227

Closed zilto closed 1 week ago

zilto commented 1 week ago

This follows #1200. The solution follows the suggestion of @MG-MW.

This problem is specific to config values. Instead of displaying their node_name and type, they display their node_name and value. In the function _get_node_label(), the type_string argument is actually the result of str(value).

Essentially, we use html.escape to encode unsafe characters ", &, <, > and the Graphviz engine will convert them back (e.g., the string lt&; is rendered as <). Inserting this safe string inside the f-string ensures a valid final string.

At the same time, we added truncation for the config value. Displaying values can quickly get messy when people pass pd.Series as config values. Escaping characters after truncations give as better results