brooklyn-data / dbt_artifacts

A dbt package for modelling dbt metadata. https://brooklyn-data.github.io/dbt_artifacts
Apache License 2.0
330 stars 125 forks source link

Improve Performance of Source/Model/Exposure extraction #20

Closed kgpayne closed 2 years ago

kgpayne commented 3 years ago

With ~1 year of historical manifest.json and run_results.json data, we have started experiencing timeouts running --full-refresh of dbt_artifacts.


1832 | 2021-04-15 15:43:49: 2021-04-15 15:43:49,243 - root - INFO - Database Error in model dim_dbt__sources (models/incremental/dim_dbt__sources.sql)
-- | --
2021-04-15 15:43:49: 2021-04-15 15:43:49,243 - root - INFO - 000630 (57014): Statement reached its statement or warehouse timeout of 1,200 second(s) and was canceled.

Possible solutions:

kgpayne commented 3 years ago

@NiallRees FYI. We had resolved to try unpacking required fields in the COPY command here, in the hope of solving our 'too much history' problem at the same time as #29 🤔 Its less flexible, but ensures that json extract happens on load, meaning the base tables are flat and therefore not a problem during full refresh.

alanmcruickshank commented 2 years ago

I propose that I try and fix this at the same time as #62 . Proposed approach there.