ClickHouse / dbt-clickhouse

The Clickhouse plugin for dbt (data build tool)
Apache License 2.0
236 stars 94 forks source link

Testing dbt project jaffle_shop failed due to the column name #25

Closed jovezhong closed 2 years ago

jovezhong commented 2 years ago

Greetings, I am running the jaffle_shop dbt example. dbt seed and dbt run passed but dbt test failed.

The dbt_project.yml

models:
  jaffle_shop:
      materialized: table
      staging:
        materialized: view

via the dbt.log, it broke because it fails to run query like this select * from dbt_alice.customers where customer_id is null

Error log

2021-12-13 20:12:02.037723 (Thread-1): Clickhouse error: Code: 47. DB::Exception: Missing columns: 'customer_id' while processing query: 'SELECT customers.customer_id, customers.first_name, customers.last_name, customer_orders.first_order, customer_orders.most_recent_order, customer_orders.number_of_orders, customer_lifetime_value FROM dbt_alice.customers WHERE customer_id IS NULL', required columns: 'customers.customer_id' 'customer_lifetime_value' 'customers.last_name' 'customer_orders.first_order' 'customer_orders.most_recent_order' 'customers.first_name' 'customer_orders.number_of_orders' 'customer_id', maybe you meant: ['customers.customer_id','customer_lifetime_value','customers.last_name','customer_orders.first_order','customer_orders.most_recent_order','customers.first_name','customer_orders.number_of_orders'].

If I wrap the column name with `(tableName).` it works in Clickhouse client select * from dbt_alice.customers where `customers.customer_id` is null

Please consider fixing this. Thanks

jovezhong commented 2 years ago

Well, I realized things are more complicated than that. The columns of dbt_alice.customers are fairly odd

SHOW CREATE TABLE customers

Query id: ccf8a564-36d4-4b51-ae5a-24286beb2d15

┌─statement───────────────────────────────────────────────────────────────────────┐ │ CREATE TABLE dbt_alice.customers ( customers.customer_id Int32, customers.first_name String, customers.last_name String, customer_orders.first_order Date, customer_orders.most_recent_order Date, customer_orders.number_of_orders UInt64, customer_lifetime_value Float64 ) ENGINE = MergeTree ORDER BY tuple() SETTINGS index_granularity = 8192

It seems that Clickhouse just uses the selected columns from different join tables as the new table names. Maybe the dbt-clickhouse adapter need to add alias for the columns

WITH
final as (

    select
        customers.customer_id,
        customers.first_name,
        customers.last_name,
        customer_orders.first_order,
        customer_orders.most_recent_order,
        customer_orders.number_of_orders,
        customer_payments.total_amount as customer_lifetime_value

    from customers

    left join customer_orders
        on customers.customer_id = customer_orders.customer_id

    left join customer_payments
        on  customers.customer_id = customer_payments.customer_id

)

select * from final
silentsokolov commented 2 years ago

So

dbt seed ``` Running with dbt=0.21.1 Found 5 models, 20 tests, 0 snapshots, 0 analyses, 178 macros, 0 operations, 3 seed files, 0 sources, 0 exposures 16:39:48 | Concurrency: 1 threads (target='dev') 16:39:48 | 16:39:48 | 1 of 3 START seed file default.raw_customers......................... [RUN] * Deprecation Warning: The quote_columns parameter was not set for seeds, so the default value of False was chosen. The default will change to True in a future release. For more information, see: https://docs.getdbt.com/v0.15/docs/seeds#section-specify-column-quoting 16:39:48 | 1 of 3 OK loaded seed file default.raw_customers..................... [INSERT 100 in 0.34s] 16:39:48 | 2 of 3 START seed file default.raw_orders............................ [RUN] 16:39:49 | 2 of 3 OK loaded seed file default.raw_orders........................ [INSERT 99 in 0.27s] 16:39:49 | 3 of 3 START seed file default.raw_payments.......................... [RUN] 16:39:49 | 3 of 3 OK loaded seed file default.raw_payments...................... [INSERT 113 in 0.27s] 16:39:49 | 16:39:49 | Finished running 3 seeds in 1.23s. Completed successfully Done. PASS=3 WARN=0 ERROR=0 SKIP=0 TOTAL=3 ```
dbt run ``` Running with dbt=0.21.1 Found 5 models, 20 tests, 0 snapshots, 0 analyses, 178 macros, 0 operations, 3 seed files, 0 sources, 0 exposures 16:39:57 | Concurrency: 1 threads (target='dev') 16:39:57 | 16:39:57 | 1 of 5 START view model default.stg_customers........................ [RUN] 16:39:58 | 1 of 5 OK created view model default.stg_customers................... [OK in 0.69s] 16:39:58 | 2 of 5 START view model default.stg_orders........................... [RUN] 16:39:58 | 2 of 5 OK created view model default.stg_orders...................... [OK in 0.64s] 16:39:58 | 3 of 5 START view model default.stg_payments......................... [RUN] 16:39:59 | 3 of 5 OK created view model default.stg_payments.................... [OK in 0.59s] 16:39:59 | 4 of 5 START table model default.customers........................... [RUN] 16:40:00 | 4 of 5 OK created table model default.customers...................... [OK in 0.72s] 16:40:00 | 5 of 5 START table model default.orders.............................. [RUN] 16:40:00 | 5 of 5 OK created table model default.orders......................... [OK in 0.67s] 16:40:00 | 16:40:00 | Finished running 3 view models, 2 table models in 3.66s. Completed successfully Done. PASS=5 WARN=0 ERROR=0 SKIP=0 TOTAL=5 ```
dbt test ``` Running with dbt=0.21.1 Found 5 models, 20 tests, 0 snapshots, 0 analyses, 178 macros, 0 operations, 3 seed files, 0 sources, 0 exposures 16:40:08 | Concurrency: 1 threads (target='dev') 16:40:08 | 16:40:08 | 1 of 20 START test accepted_values_orders_status__placed__shipped__completed__return_pending__returned [RUN] 16:40:08 | 1 of 20 PASS accepted_values_orders_status__placed__shipped__completed__return_pending__returned [PASS in 0.18s] 16:40:08 | 2 of 20 START test accepted_values_stg_orders_status__placed__shipped__completed__return_pending__returned [RUN] 16:40:08 | 2 of 20 PASS accepted_values_stg_orders_status__placed__shipped__completed__return_pending__returned [PASS in 0.19s] 16:40:08 | 3 of 20 START test accepted_values_stg_payments_payment_method__credit_card__coupon__bank_transfer__gift_card [RUN] 16:40:09 | 3 of 20 PASS accepted_values_stg_payments_payment_method__credit_card__coupon__bank_transfer__gift_card [PASS in 0.16s] 16:40:09 | 4 of 20 START test not_null_customers_customer_id.................... [RUN] 16:40:09 | 4 of 20 ERROR not_null_customers_customer_id......................... [ERROR in 0.16s] 16:40:09 | 5 of 20 START test not_null_orders_amount............................ [RUN] 16:40:09 | 5 of 20 PASS not_null_orders_amount.................................. [PASS in 0.15s] 16:40:09 | 6 of 20 START test not_null_orders_bank_transfer_amount.............. [RUN] 16:40:09 | 6 of 20 PASS not_null_orders_bank_transfer_amount.................... [PASS in 0.16s] 16:40:09 | 7 of 20 START test not_null_orders_coupon_amount..................... [RUN] 16:40:09 | 7 of 20 PASS not_null_orders_coupon_amount........................... [PASS in 0.15s] 16:40:09 | 8 of 20 START test not_null_orders_credit_card_amount................ [RUN] 16:40:09 | 8 of 20 PASS not_null_orders_credit_card_amount...................... [PASS in 0.18s] 16:40:09 | 9 of 20 START test not_null_orders_customer_id....................... [RUN] 16:40:10 | 9 of 20 PASS not_null_orders_customer_id............................. [PASS in 0.18s] 16:40:10 | 10 of 20 START test not_null_orders_gift_card_amount................. [RUN] 16:40:10 | 10 of 20 PASS not_null_orders_gift_card_amount....................... [PASS in 0.15s] 16:40:10 | 11 of 20 START test not_null_orders_order_id......................... [RUN] 16:40:10 | 11 of 20 PASS not_null_orders_order_id............................... [PASS in 0.15s] 16:40:10 | 12 of 20 START test not_null_stg_customers_customer_id............... [RUN] 16:40:10 | 12 of 20 PASS not_null_stg_customers_customer_id..................... [PASS in 0.17s] 16:40:10 | 13 of 20 START test not_null_stg_orders_order_id..................... [RUN] 16:40:10 | 13 of 20 PASS not_null_stg_orders_order_id........................... [PASS in 0.16s] 16:40:10 | 14 of 20 START test not_null_stg_payments_payment_id................. [RUN] 16:40:10 | 14 of 20 PASS not_null_stg_payments_payment_id....................... [PASS in 0.17s] 16:40:10 | 15 of 20 START test relationships_orders_customer_id__customer_id__ref_customers_ [RUN] 16:40:11 | 15 of 20 ERROR relationships_orders_customer_id__customer_id__ref_customers_ [ERROR in 0.16s] 16:40:11 | 16 of 20 START test unique_customers_customer_id..................... [RUN] 16:40:11 | 16 of 20 ERROR unique_customers_customer_id.......................... [ERROR in 0.15s] 16:40:11 | 17 of 20 START test unique_orders_order_id........................... [RUN] 16:40:11 | 17 of 20 PASS unique_orders_order_id................................. [PASS in 0.16s] 16:40:11 | 18 of 20 START test unique_stg_customers_customer_id................. [RUN] 16:40:11 | 18 of 20 PASS unique_stg_customers_customer_id....................... [PASS in 0.15s] 16:40:11 | 19 of 20 START test unique_stg_orders_order_id....................... [RUN] 16:40:11 | 19 of 20 PASS unique_stg_orders_order_id............................. [PASS in 0.16s] 16:40:11 | 20 of 20 START test unique_stg_payments_payment_id................... [RUN] 16:40:11 | 20 of 20 PASS unique_stg_payments_payment_id......................... [PASS in 0.22s] 16:40:11 | 16:40:11 | Finished running 20 tests in 3.54s. Done. PASS=17 WARN=0 ERROR=3 SKIP=0 TOTAL=20 ```

There are only 3 errors.

I think this because clickhouse use a full name for column with a dot. So jaffle_shop should use alias for columns in the final query for support a database like CH.

If want you can open issue in jaffle_shop.