fivetran / dbt_stripe

Data models for Stripe built using dbt.
https://fivetran.github.io/dbt_stripe/
Apache License 2.0
30 stars 31 forks source link

[Bug] Errors when extracting JSON values #45

Closed LewisDavies closed 1 year ago

LewisDavies commented 1 year ago

Is there an existing issue for this?

Describe the issue

I'm trying to use the metadata list variables, e.g. stripe__plan_metadata, to specify the fields to extract but I'm seeing two distinct error pattens:

Relevant error log or model output

-- plan.id
10:51:08  Database Error in model stg_stripe__plan (models/stg_stripe__plan.sql)
10:51:08    001003 (42000): SQL compilation error:
10:51:08    syntax error line 224 at position 21 unexpected '.'.
10:51:08    syntax error line 227 at position 24 unexpected '('.
10:51:08    syntax error line 227 at position 51 unexpected ','.
10:51:08    syntax error line 227 at position 53 unexpected ''feature.id''.
10:51:08    syntax error line 229 at position 11 unexpected 'as'.
10:51:08    compiled SQL at target/run/stripe_source/models/stg_stripe__plan.sql

-- set
10:48:51  Database Error in model stg_stripe__plan (models/stg_stripe__plan.sql)
10:48:51    001003 (42000): SQL compilation error:
10:48:51    syntax error line 259 at position 14 unexpected 'set'.
10:48:51    compiled SQL at target/run/stripe_source/models/stg_stripe__plan.sql

Expected behavior

Ideally, columns should be created from the metadata with valid names. However, I imagine there are a ton of corner cases regarding character substitution and reserved keywords.

An alternative approach would be to pass metadata fields to the final models without modification. This would let users parse the values in a separate view.

dbt Project configurations

vars: stripe__plan_metadata:

Package versions

packages:

What database are you using dbt with?

snowflake

dbt Version

Core:

Plugins:

Additional Context

No response

Are you willing to open a PR to help address this issue?

fivetran-joemarkiewicz commented 1 year ago

Hi @LewisDavies thanks so much for opening this issue. This is a very interesting error you are coming across. I have a few quick questions to better understand the root of the issue:

In addition to the above questions, I feel the set issue may possibly be able to be addressed by double quoting the fields in the variable list. Would you be able to attempt the following:

vars:
    stripe__plan_metadata: ['"set"','feature.id','etc.']

I am not positive it will work, but would be worth the try! I will keep digging in on my end to see if there is a better way to address the reserved word issue. In the meantime, it would be great if you could help provide details to my above questions as well. Thanks so much again for raising this with our team 😄

LewisDavies commented 1 year ago

Hey @fivetran-joemarkiewicz, sorry for the slow reply.

fivetran-joemarkiewicz commented 1 year ago

No worries at all @LewisDavies thanks for sharing more information on the issue you are experiencing. Let me try and fiddle around with this a bit and see what I can come back with regarding this parsing issue. 🤔

LewisDavies commented 1 year ago

Hey @fivetran-reneeli, here are a couple of models that don't pull through the metadata fields. The logic is all there, the columns have just been excluded from the final output.

I checked information_schema.columns and found a lot of tables that include a metadata column. The ones at the top are populated in my data, the others are empty:

CUSTOMER
CUSTOMER_BALANCE_TRANSACTION
INVOICE_LINE_ITEM
PLAN
PRICE
PRODUCT

-------------------

APPLICATION_FEE_REFUND
BANK_ACCOUNT
CARD
CHARGE
CHECKOUT_SESSION
COUPON
CREDIT_NOTE
DISPUTE
FILE_LINK
INVOICE
INVOICE_ITEM
ORDER_HISTORY
PAYMENT_INTENT
PAYMENT_METHOD
PAYOUT
PROMOTION_CODE
REFUND
SESSION
SETUP_INTENT
SKU
SOURCE
SUBSCRIPTION_HISTORY
SUBSCRIPTION_ITEM
SUBSCRIPTION_SCHEDULE
TAX_RATE
TRANSFER
TRANSFER_REVERSAL
fivetran-reneeli commented 1 year ago

Hi @LewisDavies ! Thank you for details and appreciate you linking the models.

CraigWilson-ZOE commented 1 year ago

Hi All, I have upgraded to the latest 0.7.4 packages for Stripe (both the source and normal package) and I am still getting very similar errors as the original issue.

11:37:30  Database Error in model stg_stripe__plan (models/stg_stripe__plan.sql)
11:37:30    Syntax error: Unexpected keyword AS at [199:2]
11:37:30    compiled Code at target/run/stripe_source/models/stg_stripe__plan.sql

When I check out the compiled SQL I can see a field called interval listed in the select list:

    interval

 as plan_interval 

This is failing because on BigQuery interval is a reserved word and the query parser isn't liking the syntax.

I believe here that, to ensure issues like this do not happen, at least in BigQuery, they should be quoted using the tick so that reserved words can be referenced in a query i.e.:

    `interval`

 as plan_interval 

The above SQL now passes and the model is able to run.

Can this be looked into please?

It is worth noting that I am not using any vars or any additional configuration apart from running the model directly.

CraigWilson-ZOE commented 1 year ago

Looking at the solution above on using the alias function of the vars stripe__plan_metadata this did not work and will not fix this issue as the original code will always fail:

   interval

 as plan_interval 

    from base
),

final as (

    select 
        id as plan_id,
        active as is_active,
        amount,
        currency,
        plan_interval, -- Field is aliased within get_plan_columns macro
        interval_count,
        metadata,
        nickname,
        product_id

        , replace( 

  json_extract_scalar(metadata, '$.{'name': 'interval'}' )

, '"', '') as {'name':_'interval'},
replace( 

  json_extract_scalar(metadata, '$.{'alias': 'craig_interval'}' )

, '"', '') as {'alias':_'craig_interval'}

This is the SQL that is produced, and as you can see at the top of the query, the keyword is still used, whereas the json_extract is then used later. So the query is not fixed.

CraigWilson-ZOE commented 1 year ago

The interval column is not in the metadata column, but is instead a "normal" column: Screenshot 2022-12-05 at 13 12 24

fivetran-joemarkiewicz commented 1 year ago

Hi @CraigWilson-ZOE thanks for raising this with out team. I think the issue you are detailing is different from the original issue outlined in this Bug Report. This bug report was detailing reserved words within the metadata fields that are being parsed out via a var.

Your bug however, is a field that is explicitly being selected within the model (regardless of the var) and could not be addressed using the metadata solution outlined in this bug report.

I did try to recreate the issue you had using the latest version of the package(s) and saw the interval field was being backticked in the compiled code and ran successfully on BigQuery 🤔 image

The quoting should be happening within the get_plan_columns macro from the source package. The only way I could see dbt skipping over the quote is if the target.type for your dbt adapter is not BigQuery. Would you be able to confirm that the profiles.yml for your dbt target is a BigQuery type in this instance?

For a more immediate fix, you may be able to downgrade just your stripe_source package to v0.7.3 as the only change from v0.7.4 was some under the hood updates and adjusting this interval cast to be performed within the macro opposed to the select statement.

CraigWilson-ZOE commented 1 year ago

Ah! I think you have solved the issue for me actually.

We are using the fal adapter for our dbt, so that we can run Python models locally and not have to spin up a data proc cluster. This means that our target.type won't be BigQuery.

Here is a excerpt from our profiles.yml:

default:
  target: development
  outputs:
    development:
      type: fal
      threads: 16
      db_profile: warehouse

    warehouse:
      type: bigquery
      method: oauth

I am wondering how we might be able to get around this or have this changed to deal with these new type of adapters 🤔

fivetran-joemarkiewicz commented 1 year ago

@CraigWilson-ZOE that is definitely interesting and thanks for sharing! I can see how this would cause an error for you with the target.type not matching our supported warehouses target.type names. That being said, you are using BigQuery so I still feel this can be supported. I am just not entirely familiar with the fal target and how it differs from the basic BigQuery target.

As this is a different conversation from the one being had within this issue, I have opened a new FR on the source package (linked above) for us to continue the discussion. Feel free to add more context within that ticket.

fivetran-reneeli commented 1 year ago

Closing this out as we have updated the package to allow for the metadata json field to be pivoted out. For instructions on how to correctly set it up we've included info in the README!