DAGWorks-Inc / hamilton

Hamilton helps data scientists and engineers define testable, modular, self-documenting dataflows, that encode lineage/tracing and metadata. Runs and scales everywhere python does.
https://hamilton.dagworks.io/en/latest/
BSD 3-Clause Clear License
1.82k stars 120 forks source link

graph visualization fails because of unescaped config values #1200

Open MG-MW opened 3 days ago

MG-MW commented 3 days ago

Depending on given configuration, graph visualization fails

Current behavior

import hamilton.driver
import hamilton.ad_hoc_utils

def world(hello: str) -> str:
    return f"{hello} world"

temp_module = hamilton.ad_hoc_utils.create_temporary_module(world)

driver = hamilton.driver.Builder().with_modules(temp_module).with_config({"hello": "Hi"}).build()
# does not fail
driver.display_upstream_of("world")

driver = hamilton.driver.Builder().with_modules(temp_module).with_config({"hello": "<"}).build()
# throws error
driver.display_upstream_of("world")

Stack Traces

Stack trace is not really helpful --------------------------------------------------------------------------- CalledProcessError Traceback (most recent call last) File ~/.envs/py312test/lib/python3.12/site-packages/graphviz/backend/execute.py:88, in run_check(cmd, input_lines, encoding, quiet, **kwargs) 87 try: ---> 88 proc.check_returncode() 89 except subprocess.CalledProcessError as e: File ~/.local/share/mise/installs/python/3.12.3/lib/python3.12/subprocess.py:502, in CompletedProcess.check_returncode(self) 501 if self.returncode: --> 502 raise CalledProcessError(self.returncode, self.args, self.stdout, 503 self.stderr) CalledProcessError: Command '[PosixPath('dot'), '-Kdot', '-Tsvg']' returned non-zero exit status 1. During handling of the above exception, another exception occurred: CalledProcessError Traceback (most recent call last) File ~/.envs/py312test/lib/python3.12/site-packages/IPython/core/formatters.py:977, in MimeBundleFormatter.__call__(self, obj, include, exclude) 974 method = get_real_method(obj, self.print_method) 976 if method is not None: --> 977 return method(include=include, exclude=exclude) 978 return None 979 else: File ~/.envs/py312test/lib/python3.12/site-packages/graphviz/jupyter_integration.py:98, in JupyterIntegration._repr_mimebundle_(self, include, exclude, **_) 96 include = set(include) if include is not None else {self._jupyter_mimetype} 97 include -= set(exclude or []) ---> 98 return {mimetype: getattr(self, method_name)() 99 for mimetype, method_name in MIME_TYPES.items() 100 if mimetype in include} File ~/.envs/py312test/lib/python3.12/site-packages/graphviz/jupyter_integration.py:112, in JupyterIntegration._repr_image_svg_xml(self) 110 def _repr_image_svg_xml(self) -> str: 111 """Return the rendered graph as SVG string.""" --> 112 return self.pipe(format='svg', encoding=SVG_ENCODING) File ~/.envs/py312test/lib/python3.12/site-packages/graphviz/piping.py:104, in Pipe.pipe(self, format, renderer, formatter, neato_no_op, quiet, engine, encoding) 55 def pipe(self, 56 format: typing.Optional[str] = None, 57 renderer: typing.Optional[str] = None, (...) 61 engine: typing.Optional[str] = None, 62 encoding: typing.Optional[str] = None) -> typing.Union[bytes, str]: 63 """Return the source piped through the Graphviz layout command. 64 65 Args: (...) 102 ' 104 return self._pipe_legacy(format, 105 renderer=renderer, 106 formatter=formatter, 107 neato_no_op=neato_no_op, 108 quiet=quiet, 109 engine=engine, 110 encoding=encoding) File ~/.envs/py312test/lib/python3.12/site-packages/graphviz/_tools.py:171, in deprecate_positional_args..decorator..wrapper(*args, **kwargs) 162 wanted = ', '.join(f'{name}={value!r}' 163 for name, value in deprecated.items()) 164 warnings.warn(f'The signature of {func.__name__} will be reduced' 165 f' to {supported_number} positional args' 166 f' {list(supported)}: pass {wanted}' 167 ' as keyword arg(s)', 168 stacklevel=stacklevel, 169 category=category) --> 171 return func(*args, **kwargs) File ~/.envs/py312test/lib/python3.12/site-packages/graphviz/piping.py:121, in Pipe._pipe_legacy(self, format, renderer, formatter, neato_no_op, quiet, engine, encoding) 112 @_tools.deprecate_positional_args(supported_number=2) 113 def _pipe_legacy(self, 114 format: typing.Optional[str] = None, (...) 119 engine: typing.Optional[str] = None, 120 encoding: typing.Optional[str] = None) -> typing.Union[bytes, str]: --> 121 return self._pipe_future(format, 122 renderer=renderer, 123 formatter=formatter, 124 neato_no_op=neato_no_op, 125 quiet=quiet, 126 engine=engine, 127 encoding=encoding) File ~/.envs/py312test/lib/python3.12/site-packages/graphviz/piping.py:149, in Pipe._pipe_future(self, format, renderer, formatter, neato_no_op, quiet, engine, encoding) 146 if encoding is not None: 147 if codecs.lookup(encoding) is codecs.lookup(self.encoding): 148 # common case: both stdin and stdout need the same encoding --> 149 return self._pipe_lines_string(*args, encoding=encoding, **kwargs) 150 try: 151 raw = self._pipe_lines(*args, input_encoding=self.encoding, **kwargs) File ~/.envs/py312test/lib/python3.12/site-packages/graphviz/backend/piping.py:212, in pipe_lines_string(engine, format, input_lines, encoding, renderer, formatter, neato_no_op, quiet) 206 cmd = dot_command.command(engine, format, 207 renderer=renderer, 208 formatter=formatter, 209 neato_no_op=neato_no_op) 210 kwargs = {'input_lines': input_lines, 'encoding': encoding} --> 212 proc = execute.run_check(cmd, capture_output=True, quiet=quiet, **kwargs) 213 return proc.stdout File ~/.envs/py312test/lib/python3.12/site-packages/graphviz/backend/execute.py:90, in run_check(cmd, input_lines, encoding, quiet, **kwargs) 88 proc.check_returncode() 89 except subprocess.CalledProcessError as e: ---> 90 raise CalledProcessError(*e.args) 92 return proc CalledProcessError: Command '[PosixPath('dot'), '-Kdot', '-Tsvg']' returned non-zero exit status 1. [stderr: "Error: : syntax error in line 9 near 'subgraph'\n"]

Library & System Information

hamilton version: 1.81.0

$ dot --version
dot - graphviz version 12.0.0 (20240803.0821)

Expected behavior

Escape representations of nodes

Additional context

I could rectify the behaviour by patching this code: https://github.com/DAGWorks-Inc/hamilton/blob/bc7cbbf475a71145846f6e169fc09a5f12d00e23/hamilton/graph.py#L258-L273

like this

    def _get_node_label(
        n: node.Node,
        name: Optional[str] = None,
        type_string: Optional[str] = None,
    ) -> str:
        """Get a graphviz HTML-like node label. It uses the DAG node
        name and type but values can be overridden. Overriding is currently
        used for materializers since `type_` is stored in n.tags.

        ref: https://graphviz.org/doc/info/shapes.html#html
        """
        import html

        name = n.name if name is None else name
        if type_string is None:
            type_string = get_type_as_string(n.type) if get_type_as_string(n.type) else ""

        escaped_type_string=type_string

        # unrelated but my config values can be pretty long
        if len(escaped_type_string) > 80:
            escaped_type_string = escaped_type_string[:80] + "[...]"

        escaped_type_string=html.escape(escaped_type_string, quote=True)
        return f"<<b>{name}</b><br /><br /><i>{escaped_type_string}</i>>"

Given that a python class repr and a configuration value in particular can contain any chars imaginable, these should probably be escaped.

I pretty busy this week, but maybe I can throw a PR with a proper testcase together by next week.

zilto commented 2 days ago

Thanks for raising the issue! Here are a few notes to add to your assessment: