DataRecce / recce

The dbt data-validation toolkit for teams that care about building better data
https://datarecce.io
Apache License 2.0
257 stars 6 forks source link

[DRC-490] [Bug] recce summary genarete failed #336

Closed tomoki-takahashi-oisix closed 5 months ago

tomoki-takahashi-oisix commented 5 months ago

Current Behavior

I ran the recce summary command, but I encountered the following error:

Traceback (most recent call last):
  File "/Users/takahashi_tomoki/Documents/data-cuisine-mwaa/venv/lib/python3.11/site-packages/recce/event/track.py", line 60, in invoke
    ret = super(TrackCommand, self).invoke(ctx)
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/takahashi_tomoki/Documents/data-cuisine-mwaa/venv/lib/python3.11/site-packages/click/core.py", line 1404, in invoke
    return ctx.invoke(self.callback, **ctx.params)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/takahashi_tomoki/Documents/data-cuisine-mwaa/venv/lib/python3.11/site-packages/click/core.py", line 760, in invoke
    return __callback(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/takahashi_tomoki/Documents/data-cuisine-mwaa/venv/lib/python3.11/site-packages/recce/cli.py", line 220, in summary
    output = generate_markdown_summary(ctx, summary_format=kwargs.get('format'))
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/takahashi_tomoki/Documents/data-cuisine-mwaa/venv/lib/python3.11/site-packages/recce/summary.py", line 389, in generate_markdown_summary
    graph = _build_lineage_graph(base_lineage, curr_lineage)
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/takahashi_tomoki/Documents/data-cuisine-mwaa/venv/lib/python3.11/site-packages/recce/summary.py", line 268, in _build_lineage_graph
    graph.create_edge(parent_id, child_id, 'base')
  File "/Users/takahashi_tomoki/Documents/data-cuisine-mwaa/venv/lib/python3.11/site-packages/recce/summary.py", line 222, in create_edge
    raise ValueError(f'Parent node {parent_id} not found in graph')
ValueError: Parent node snapshot.data_cuisine_dbt.advertising_agency_dbt_snapshot not found in graph

Expected Behavior

The recce summary command should execute successfully.

Steps To Reproduce

$ recce run 
$ recce summary recce_state.json

Relevant log output

No response

Environment

Additional Context

No response

DRC-490

even-wei commented 5 months ago

Hi @tomoki-takahashi-oisix

Is there any sensitive info in the Recce state file? If not, sharing the state file with us could accelerate the bug fixing process. If it's not sharable, it's OK, we will investigate with the error message first.

tomoki-takahashi-oisix commented 5 months ago

As you mentioned, the Recce state file contains confidential information, so I cannot share it.

even-wei commented 5 months ago

related issue on Slack https://getdbt.slack.com/archives/C05C28V7CPP/p1718145080517769

popcornylu commented 5 months ago

I can reproduce this issue if there is a snapshot used in a project.

$ recce summary ./recce_state.json
Traceback (most recent call last):
  File "/Users/popcorny/Documents/infuseai/repo/recce/recce/event/track.py", line 60, in invoke
    ret = super(TrackCommand, self).invoke(ctx)
  File "/Users/popcorny/Documents/infuseai/repo/jaffle_shop_duckdb/venv/lib/python3.9/site-packages/click/core.py", line 1404, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/Users/popcorny/Documents/infuseai/repo/jaffle_shop_duckdb/venv/lib/python3.9/site-packages/click/core.py", line 760, in invoke
    return __callback(*args, **kwargs)
  File "/Users/popcorny/Documents/infuseai/repo/recce/recce/cli.py", line 220, in summary
    output = generate_markdown_summary(ctx, summary_format=kwargs.get('format'))
  File "/Users/popcorny/Documents/infuseai/repo/recce/recce/summary.py", line 389, in generate_markdown_summary
    graph = _build_lineage_graph(base_lineage, curr_lineage)
  File "/Users/popcorny/Documents/infuseai/repo/recce/recce/summary.py", line 271, in _build_lineage_graph
    graph.create_edge(parent_id, child_id, 'current')
  File "/Users/popcorny/Documents/infuseai/repo/recce/recce/summary.py", line 224, in create_edge
    raise ValueError(f'Child node {child_id} not found in graph')
ValueError: Child node snapshot.jaffle_shop.my_snapshot not found in graph
popcornylu commented 5 months ago

Fixed in #339