ClickHouse / dbt-clickhouse

The Clickhouse plugin for dbt (data build tool)
Apache License 2.0
254 stars 113 forks source link

Null unit test inputs raise error for non-nullable columns #315

Open smoothml opened 4 months ago

smoothml commented 4 months ago

Describe the bug

When using Clickhouse it is common practice to not allows columns to be nullable so as to improve the efficiency of analytic-type queries. However, this is not compatible with how unit tests in DBT handle missing values and so results in an error such as the following:

DB::Exception: Cannot convert NULL to a non-nullable type: In scope SELECT CAST(1, 'UInt8') AS id, CAST(NULL, 'String') AS foo.

I originally raised this in dbt-core (here) and was directed here. One of the maintainers suggests where the root cause may be and suggested a fix in this comment:

It sounds to me like the root cause is that dbt-clickhouse doesn't yet support dbt unit tests rather than this being a bug in dbt-core. So I'm going to close this issue in favor of you opening an issue in dbt-clickhouse instead.

Specifically, it looks like dbt-clickhouse will need to override this line within get_empty_schema_sql:

      {{ cast('null', col['data_type']) }} as {{ col_name }}{{ ", " if not loop.last }}

Steps to reproduce

Consider the following example, based on the output of the dbt init command:

-- my_first_dbt_model.sql
{{ config(materialized='table') }}

select 1 as id, 'a' AS foo
union all
select 2 as id, 'b' AS foo

-- my_second_dbt_model.sql
select *
from {{ ref('my_first_dbt_model') }}
where id = 1

-- schema.yml
version: 2

models:
  - name: my_first_dbt_model
    description: "A starter dbt model"
    columns:
      - name: id
        data_type: uint64
      - name: foo
        data_type: string
  - name: my_second_dbt_model
    description: "A starter dbt model"
    columns:
      - name: id
        data_type: uint64
      - name: foo
        data_type: string
unit_tests:
  - name: test_not_null
    model: my_second_dbt_model
    given:
      - input: ref('my_first_dbt_model')
        rows:
          - {id: 1}
          - {id: 2}
    expect:
      rows:
        - {id: 1}

Values for the column foo are omitted in the tests, with only the id values provided. Running dbt build or dbt test results in the error.

Expected behaviour

If a column is not-nullable and a value not provided in the test input, the values used should be the appropriate non-null default. For example, Clickhouse would typically use 0 instead of NULL for a non-nullable numeric column.

dbt and/or ClickHouse server logs

Running dbt build on the above example results in the following stacktrace:

>>> poetry run dbt build
11:10:02  Running with dbt=1.8.3
11:10:03  Registered adapter: clickhouse=1.8.0
11:10:03  Found 2 models, 443 macros, 1 unit test
11:10:03
11:10:03  Concurrency: 1 threads (target='test')
11:10:03
11:10:03  1 of 3 START sql table model `dap_test`.`my_first_dbt_model` ................... [RUN]
11:10:03  1 of 3 OK created sql table model `dap_test`.`my_first_dbt_model` .............. [OK in 0.07s]
11:10:03  2 of 3 START unit_test my_second_dbt_model::test_not_null ...................... [RUN]
11:10:03  2 of 3 ERROR my_second_dbt_model::test_not_null ................................ [ERROR in 0.04s]
11:10:03  3 of 3 SKIP relation dap_test.my_second_dbt_model .............................. [SKIP]
11:10:03
11:10:03  Finished running 1 table model, 1 unit test, 1 view model in 0 hours 0 minutes and 0.27 seconds (0.27s).
11:10:03
11:10:03  Completed with 1 error and 0 warnings:
11:10:03
11:10:03    Runtime Error in unit_test test_not_null (models/example/schema.yml)
  An error occurred during execution of unit test 'test_not_null'. There may be an error in the unit test definition: check the data types.
   Database Error
    Code: 70.
    DB::Exception: Cannot convert NULL to a non-nullable type: In scope SELECT CAST(1, 'UInt8') AS id, CAST(NULL, 'String') AS foo. Stack trace:

    0. Poco::Exception::Exception(String const&, int) @ 0x000000010ef61640
    1. DB::Exception::Exception(DB::Exception::MessageMasked&&, int, bool) @ 0x000000010906f044
    2. DB::Exception::Exception(PreformattedMessage&&, int) @ 0x0000000104cb991c
    3. DB::Exception::Exception<>(int, FormatStringHelperImpl<>) @ 0x0000000104cc5178
    4. DB::(anonymous namespace)::FunctionCast::prepareUnpackDictionaries(std::shared_ptr<DB::IDataType const> const&, std::shared_ptr<DB::IDataType const> const&) const @ 0x000000010b9cb094
    5. DB::(anonymous namespace)::FunctionCast::prepare(std::vector<DB::ColumnWithTypeAndName, std::allocator<DB::ColumnWithTypeAndName>> const&) const @ 0x000000010b9c89e8
    6. DB::QueryAnalyzer::resolveFunction(std::shared_ptr<DB::IQueryTreeNode>&, DB::IdentifierResolveScope&) @ 0x000000010c68e090
    7. DB::QueryAnalyzer::resolveExpressionNode(std::shared_ptr<DB::IQueryTreeNode>&, DB::IdentifierResolveScope&, bool, bool, bool) @ 0x000000010c679854
    8. DB::QueryAnalyzer::resolveExpressionNodeList(std::shared_ptr<DB::IQueryTreeNode>&, DB::IdentifierResolveScope&, bool, bool) @ 0x000000010c678dd8
    9. DB::QueryAnalyzer::resolveProjectionExpressionNodeList(std::shared_ptr<DB::IQueryTreeNode>&, DB::IdentifierResolveScope&) @ 0x000000010c698918
    10. DB::QueryAnalyzer::resolveQuery(std::shared_ptr<DB::IQueryTreeNode> const&, DB::IdentifierResolveScope&) @ 0x000000010c674a04
    11. DB::QueryAnalyzer::resolveUnion(std::shared_ptr<DB::IQueryTreeNode> const&, DB::IdentifierResolveScope&) @ 0x000000010c676848
    12. DB::QueryAnalyzer::resolveExpressionNode(std::shared_ptr<DB::IQueryTreeNode>&, DB::IdentifierResolveScope&, bool, bool, bool) @ 0x000000010c679ae4
    13. DB::QueryAnalyzer::resolveQueryJoinTreeNode(std::shared_ptr<DB::IQueryTreeNode>&, DB::IdentifierResolveScope&, DB::QueryExpressionsAliasVisitor&) @ 0x000000010c69a694
    14. DB::QueryAnalyzer::resolveQuery(std::shared_ptr<DB::IQueryTreeNode> const&, DB::IdentifierResolveScope&) @ 0x000000010c6749c0
    15. DB::QueryAnalyzer::resolveExpressionNode(std::shared_ptr<DB::IQueryTreeNode>&, DB::IdentifierResolveScope&, bool, bool, bool) @ 0x000000010c679764
    16. DB::QueryAnalyzer::resolveQueryJoinTreeNode(std::shared_ptr<DB::IQueryTreeNode>&, DB::IdentifierResolveScope&, DB::QueryExpressionsAliasVisitor&) @ 0x000000010c69a694
    17. DB::QueryAnalyzer::resolveQuery(std::shared_ptr<DB::IQueryTreeNode> const&, DB::IdentifierResolveScope&) @ 0x000000010c6749c0
    18. DB::QueryAnalyzer::resolve(std::shared_ptr<DB::IQueryTreeNode>&, std::shared_ptr<DB::IQueryTreeNode> const&, std::shared_ptr<DB::Context const>) @ 0x000000010c673f88
    19. DB::QueryAnalysisPass::run(std::shared_ptr<DB::IQueryTreeNode>&, std::shared_ptr<DB::Context const>) @ 0x000000010c6738a0
    20. DB::QueryTreePassManager::run(std::shared_ptr<DB::IQueryTreeNode>) @ 0x000000010c5f47fc
    21. DB::(anonymous namespace)::buildQueryTreeAndRunPasses(std::shared_ptr<DB::IAST> const&, DB::SelectQueryOptions const&, std::shared_ptr<DB::Context const> const&, std::shared_ptr<DB::IStorage> const&) @ 0x000000010cd88d40
    22. DB::InterpreterSelectQueryAnalyzer::InterpreterSelectQueryAnalyzer(std::shared_ptr<DB::IAST> const&, std::shared_ptr<DB::Context const> const&, DB::SelectQueryOptions const&, std::vector<String, std::allocator<String>> const&) @ 0x000000010cd86e58
    23. DB::InterpreterSelectQueryAnalyzer::getSampleBlock(std::shared_ptr<DB::IAST> const&, std::shared_ptr<DB::Context const> const&, DB::SelectQueryOptions const&) @ 0x000000010cd89160
    24. DB::InterpreterCreateQuery::getTablePropertiesAndNormalizeCreateQuery(DB::ASTCreateQuery&, DB::LoadingStrictnessLevel) const @ 0x000000010ccedbec
    25. DB::InterpreterCreateQuery::createTable(DB::ASTCreateQuery&) @ 0x000000010ccf2adc
    26. DB::InterpreterCreateQuery::execute() @ 0x000000010ccfc658
    27. DB::executeQueryImpl(char const*, char const*, std::shared_ptr<DB::Context>, DB::QueryFlags, DB::QueryProcessingStage::Enum, DB::ReadBuffer*) @ 0x000000010d0691b0
    28. DB::executeQuery(String const&, std::shared_ptr<DB::Context>, DB::QueryFlags, DB::QueryProcessingStage::Enum) @ 0x000000010d065fd4
    29. DB::TCPHandler::runImpl() @ 0x000000010dd52f00
    30. DB::TCPHandler::run() @ 0x000000010dd68268
    31. Poco::Net::TCPServerConnection::start() @ 0x000000010f012378
11:10:03
11:10:03  Done. PASS=1 WARN=0 ERROR=1 SKIP=1 TOTAL=3

Environment

ClickHouse server