DataRecce / recce

The dbt data-validation toolkit for teams that care about building better data
https://datarecce.io
Apache License 2.0
222 stars 4 forks source link

[DRC-561] [Bug] AttributeError: 'list' object has no attribute 'from_dict' #384

Open SBurwash opened 1 month ago

SBurwash commented 1 month ago

Current Behavior

When running the row_count diff, I get the following error

Future exception was never retrieved
future: <Future finished exception=AttributeError("'list' object has no attribute 'from_dict'")>
Traceback (most recent call last):
  File "/usr/local/lib/python3.11/concurrent/futures/thread.py", line 58, in run
    result = self.fn(*self.args, **self.kwargs)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/recce/apis/run_func.py", line 87, in fn
    raise e
  File "/usr/local/lib/python3.11/site-packages/recce/apis/run_func.py", line 80, in fn
    result = task.execute()
             ^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/recce/tasks/rowcount.py", line 121, in execute
    return self.execute_dbt()
           ^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/recce/tasks/rowcount.py", line 59, in execute_dbt
    node_ids = dbt_adapter.select_nodes(self.params.get('select', ""), self.params.get('exclude', ""))
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/recce/adapter/dbt_adapter/__init__.py", line 635, in select_nodes
    manifest = self.manifest.deepcopy()
               ^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/dbt/contracts/graph/manifest.py", line 1004, in deepcopy
    disabled={k: _deepcopy(v) for k, v in self.disabled.items()},
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/dbt/contracts/graph/manifest.py", line 1004, in <dictcomp>
    disabled={k: _deepcopy(v) for k, v in self.disabled.items()},
                 ^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/dbt/contracts/graph/manifest.py", line 548, in _deepcopy
    return value.from_dict(value.to_dict(omit_none=True))
           ^^^^^^^^^^^^^^^
AttributeError: 'list' object has no attribute 'from_dict'

Expected Behavior

I expect to be capable of running the row count diff with error.

I get the same error when running recce run

Steps To Reproduce

  1. In main - dbt compile --target prod --target-path target-base/ # Tables already exist in prod
  2. In main - dbt docs generate --target prod --target-path target-base/
  3. In branch - dbt compile # Tables already exist in dev
  4. In branch - dbt compile
  5. In branch - recce run

Relevant log output

INFO:     192.168.65.1:59150 - "GET /api/version HTTP/1.1" 200 OK
INFO:     192.168.65.1:48857 - "GET /api/info HTTP/1.1" 200 OK
INFO:     192.168.65.1:53986 - "POST /api/runs/aggregate HTTP/1.1" 200 OK
INFO:     192.168.65.1:32861 - "POST /api/runs HTTP/1.1" 201 Created
INFO:     192.168.65.1:32861 - "GET /api/runs/cac3aa5e-ab5e-4300-9951-fbc43831db96/wait HTTP/1.1" 200 OK
INFO:     192.168.65.1:32861 - "POST /api/runs/aggregate HTTP/1.1" 200 OK
INFO:     192.168.65.1:37180 - "GET /api/checks HTTP/1.1" 200 OK
INFO:     192.168.65.1:37180 - "GET /api/checks/ab462be2-9083-4f11-a790-dee89d3f8b15 HTTP/1.1" 200 OK
INFO:     192.168.65.1:37180 - "POST /api/checks/ab462be2-9083-4f11-a790-dee89d3f8b15/run HTTP/1.1" 201 Created
Future exception was never retrieved
future: <Future finished exception=AttributeError("'list' object has no attribute 'from_dict'")>
Traceback (most recent call last):
  File "/usr/local/lib/python3.11/concurrent/futures/thread.py", line 58, in run
    result = self.fn(*self.args, **self.kwargs)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/recce/apis/run_func.py", line 87, in fn
    raise e
  File "/usr/local/lib/python3.11/site-packages/recce/apis/run_func.py", line 80, in fn
    result = task.execute()
             ^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/recce/tasks/rowcount.py", line 121, in execute
    return self.execute_dbt()
           ^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/recce/tasks/rowcount.py", line 59, in execute_dbt
    node_ids = dbt_adapter.select_nodes(self.params.get('select', ""), self.params.get('exclude', ""))
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/recce/adapter/dbt_adapter/__init__.py", line 635, in select_nodes
    manifest = self.manifest.deepcopy()
               ^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/dbt/contracts/graph/manifest.py", line 1004, in deepcopy
    disabled={k: _deepcopy(v) for k, v in self.disabled.items()},
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/dbt/contracts/graph/manifest.py", line 1004, in <dictcomp>
    disabled={k: _deepcopy(v) for k, v in self.disabled.items()},
                 ^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/dbt/contracts/graph/manifest.py", line 548, in _deepcopy
    return value.from_dict(value.to_dict(omit_none=True))
           ^^^^^^^^^^^^^^^
AttributeError: 'list' object has no attribute 'from_dict'
INFO:     192.168.65.1:37180 - "GET /api/runs/c0bb79fd-4662-4556-9c25-731cbc96c0d9/wait?timeout=2 HTTP/1.1" 200 OK
INFO:     192.168.65.1:37180 - "GET /api/checks/ab462be2-9083-4f11-a790-dee89d3f8b15 HTTP/1.1" 200 OK
INFO:     192.168.65.1:37180 - "POST /api/runs/c0bb79fd-4662-4556-9c25-731cbc96c0d9/cancel HTTP/1.1" 200 OK
INFO:     192.168.65.1:32016 - "GET /api/checks HTTP/1.1" 200 OK
INFO:     192.168.65.1:43032 - "GET /api/info HTTP/1.1" 200 OK
INFO:     192.168.65.1:48994 - "POST /api/runs/aggregate HTTP/1.1" 200 OK

### Environment

- recce: 0.25.2
- OS: 14.1.1 (23B81)
- Python: 3.11.7
- Data Warehouse: BigQuery
- dbt: 1.7.15

### Additional Context

I am running dbt inside a container which is not at root of my repo (within a `/dbt` subdirectory)

From SyncLinear.com | DRC-561

popcornylu commented 1 month ago

@SBurwash Thanks for your report. It seems that there are some cached tests which is from original removed node. Could you please help to remove the target/ folder and rerun dbt again to make the target/ folder as clean as possible? It would be a possible workaround solution.

At the same time, I would try to reproduce and see how to solve this.

popcornylu commented 1 month ago

This issue is fixed. Please check if it is fixed in the next release v0.26.0. ETA 7/16

SBurwash commented 1 month ago

I'll try that out this morning, thanks @popcornylu !

Will keep you updated on how it turns out

SBurwash commented 1 month ago

It is fixed! Now I'm just trying to figure out a way to make it go faster... will investigate! Thanks :D

SBurwash commented 1 month ago

@popcornylu I am experiencing another issue where when running recce run, I am getting no output logs and am left hanging.

Is this normal? Here is what I've seen for the past 5 minutes

────────────────────────────────────────────────────────────────────────────────────── DBT Artifacts ──────────────────────────────────────────────────────────────────────────────────────
Base:
    Manifest: 2024-07-17 19:35:11.446647+00:00
    Catalog:  2024-07-17 19:41:44.870821+00:00
Current:
    Manifest: 2024-07-17 20:18:55.047562+00:00
    Catalog:  2024-07-17 19:50:12.538694+00:00
────────────────────────────────────────────────────────────────────────────────────── Preset checks ──────────────────────────────────────────────────────────────────────────────────────
20:27:35  Found a seed (<my_table_name>) >1MB in size at the same path, dbt cannot tell if it has changed: assuming they are the same