InfuseAI / piperider

Code review for data in dbt
https://www.piperider.io/
Apache License 2.0
478 stars 23 forks source link

'bigquery' key error when running compare-reports fails to produce diff summary #932

Open Stochastic-Squirrel opened 8 months ago

Stochastic-Squirrel commented 8 months ago

First of all, I am really enjoying this tool! Unfortunately I have come across this bug which is blocking a rollout to the wider team so I am hoping that there is a quick fix!

Describe the bug When comparing two piperider reports, a warning "bigquery" is returned and no comparison summary is generated.

$ piperider compare-reports --base /target_prod_profile/run.json --target /outputs/latest/run.json --output ./comparison_report
────────────────────────────── Comparison report ───────────────────────────────
Selected reports:
  Base:   /target_prod_profile/run.json
  Target: /outputs/pre-release-20231206213745/run.json
Warning:
'bigquery'
Got problem to generate changeset.
Comparison report: 
/data_warehouse/comparison_report/index.html

Reproduce Unfortunately, I cannot provide the manifest jsons but I will try my best to describe the issue and steps taken.

  1. I generated a run report on production and I run this across all models in prod
    dbt compile -t prod
    piperider run --dbt-target prod --debug --report-dir $CI_PROJECT_DIR
  2. When I open an MR, I run a dbt run on the modified and new models only in staging
    dbt run --fail-fast -t pre-release --select "state:modified.body+ state:modified.configs+ state:new+" --defer --state /target_prod
  3. I create the staging piperider run report only on the models above
    piperider run --select "state:modified.body+ state:modified.configs+ state:new+" --state /target_prod --dbt-target pre-release --debug --report-dir $CI_PROJECT_DIR
  4. I then compare the two reports
    piperider compare-reports --base /target_prod_profile/run.json --target /outputs/latest/run.json --output ./comparison_report

What's strange is that the diff summary report works for some MRs but not others. I have tried to find the common trait but I am unable to. The MR and subsequent report comparison that works is a very simple test case where I add a text column to an existing table with a constant value e.g.

...
"apples" as fruit,
...

Looking at the comparison report, row and col information for both base (production) and target (staging) are recorded.

What I have tried

I attached a debugger and I tried to figure out what was going on.

    def lookup_adapter(self, adapter_name: str) -> Adapter:
        return self.adapters[adapter_name]

Relevant code linked here

Expected behavior Diff summary reports for dbt models that have been changed. Example output below from the successful MR comparison

Selected reports:
  Base:   /target_prod_profile/run.json
  Target: /outputs/pre-release-20231206165611/run.json
Impact Summary:
  Code Changes: added=0, removed=0, modified=2
  Resource Impact: potentially_impacted=7, assessed=7, skipped=0, impacted=5
Comparison report: 
/data_warehouse/comparison_report/index.html
Comparison summary: 
/data_warehouse/comparison_report/summary.md

Desktop (please complete the following information):

DaveFlynn commented 8 months ago

@Stochastic-Squirrel Thanks for reporting this issue. The team is taking a look and we'll get back to you shortly

DaveFlynn commented 8 months ago

Hi @Stochastic-Squirrel

We're having some difficulty in reproducing the issue. We'll continue to look into this.

It looks like you already ran PipeRider with --debug ? If not, I would suggest that as a further debugging step. There may be a lot of output - If there's nothing sensitive in the output you could share that with us. Either attach it here, or email to product@piperider.io

In the meantime, you could try out an Impact Report manually by using the DBT Manifest Analyzer in PipeRider Cloud:

  1. Log into PipeRider Cloud
  2. Click the Analyze tab
  3. Upload two manifest files into the Manifest Analyzer

I'll follow up on this when we have had more success reproducing the issue.

Thanks,

Dave

Stochastic-Squirrel commented 8 months ago

Hi Dave, thanks for reaching out!

I have been using the --debug flag throughout and nothing is printed to the console. Unfortunately I haven't been able to glean any more info! I tried the website now and I experienced a server error when attempting uploads Sentry event id: 229c6e399c2b48d08f61682dff0ac69a I have tried uploading a single manifest as well as two at the same time.

Unfortunately I don't feel comfortable sharing the manifests in their entirety. I'll try to cut them down to a minimal set. What are the essential keys needed? I am thinking of maybe isolating a single table that does not contain any sensitive information.

However, I do have an update on what causes the error! I experimented some more and I noticed that it is only when changing the model YAML files that the error occurs. Changes to the SQL models seem to work fine.

Here are some scenarios that causes the error

In my case, I have to tweak model yamls for SQL models that are affected by a change e.g. data type change.

I hope this makes it a bit easier to recreate the issue on your end.

popcornylu commented 8 months ago

@Stochastic-Squirrel

Hi, thanks for your information. I know there is some privacy concern for providing the real run.json. So Is it possible to provide the two run.json from a dummy project? It would help us to reproduce the issue.

Expected reproduce steps

  1. Download your two run.json
  2. run
    piperider compare-reports --base run_base.json --target run.json --output ./comparison_report
  3. Get error result:
    Selected reports:
      Base:   run_base.json
      Target: run.json
    Warning:
    'bigquery'
    Got problem to generate changeset.
    Comparison report: 
    /data_warehouse/comparison_report/index.html
Stochastic-Squirrel commented 8 months ago

thanks @popcornylu. I'll try to reproduce this error in a dummy project as soon as I can!