When using Clickhouse it is common practice to not allows columns to be nullable so as to improve the efficiency of analytic-type queries. However, this is not compatible with how unit tests in DBT handle missing values and so results in an error such as the following:
DB::Exception: Cannot convert NULL to a non-nullable type: In scope SELECT CAST(1, 'UInt8') AS id, CAST(NULL, 'String') AS foo.
I originally raised this in dbt-core (here) and was directed here. One of the maintainers suggests where the root cause may be and suggested a fix in this comment:
It sounds to me like the root cause is that dbt-clickhouse doesn't yet support dbt unit tests rather than this being a bug in dbt-core. So I'm going to close this issue in favor of you opening an issue in dbt-clickhouse instead.
Specifically, it looks like dbt-clickhouse will need to override this line within get_empty_schema_sql:
{{ cast('null', col['data_type']) }} as {{ col_name }}{{ ", " if not loop.last }}
Steps to reproduce
Consider the following example, based on the output of the dbt init command:
-- my_first_dbt_model.sql
{{ config(materialized='table') }}
select 1 as id, 'a' AS foo
union all
select 2 as id, 'b' AS foo
-- my_second_dbt_model.sql
select *
from {{ ref('my_first_dbt_model') }}
where id = 1
-- schema.yml
version: 2
models:
- name: my_first_dbt_model
description: "A starter dbt model"
columns:
- name: id
data_type: uint64
- name: foo
data_type: string
- name: my_second_dbt_model
description: "A starter dbt model"
columns:
- name: id
data_type: uint64
- name: foo
data_type: string
unit_tests:
- name: test_not_null
model: my_second_dbt_model
given:
- input: ref('my_first_dbt_model')
rows:
- {id: 1}
- {id: 2}
expect:
rows:
- {id: 1}
Values for the column foo are omitted in the tests, with only the id values provided. Running dbt build or dbt test results in the error.
Expected behaviour
If a column is not-nullable and a value not provided in the test input, the values used should be the appropriate non-null default. For example, Clickhouse would typically use 0 instead of NULL for a non-nullable numeric column.
dbt and/or ClickHouse server logs
Running dbt build on the above example results in the following stacktrace:
>>> poetry run dbt build
11:10:02 Running with dbt=1.8.3
11:10:03 Registered adapter: clickhouse=1.8.0
11:10:03 Found 2 models, 443 macros, 1 unit test
11:10:03
11:10:03 Concurrency: 1 threads (target='test')
11:10:03
11:10:03 1 of 3 START sql table model `dap_test`.`my_first_dbt_model` ................... [RUN]
11:10:03 1 of 3 OK created sql table model `dap_test`.`my_first_dbt_model` .............. [OK in 0.07s]
11:10:03 2 of 3 START unit_test my_second_dbt_model::test_not_null ...................... [RUN]
11:10:03 2 of 3 ERROR my_second_dbt_model::test_not_null ................................ [ERROR in 0.04s]
11:10:03 3 of 3 SKIP relation dap_test.my_second_dbt_model .............................. [SKIP]
11:10:03
11:10:03 Finished running 1 table model, 1 unit test, 1 view model in 0 hours 0 minutes and 0.27 seconds (0.27s).
11:10:03
11:10:03 Completed with 1 error and 0 warnings:
11:10:03
11:10:03 Runtime Error in unit_test test_not_null (models/example/schema.yml)
An error occurred during execution of unit test 'test_not_null'. There may be an error in the unit test definition: check the data types.
Database Error
Code: 70.
DB::Exception: Cannot convert NULL to a non-nullable type: In scope SELECT CAST(1, 'UInt8') AS id, CAST(NULL, 'String') AS foo. Stack trace:
0. Poco::Exception::Exception(String const&, int) @ 0x000000010ef61640
1. DB::Exception::Exception(DB::Exception::MessageMasked&&, int, bool) @ 0x000000010906f044
2. DB::Exception::Exception(PreformattedMessage&&, int) @ 0x0000000104cb991c
3. DB::Exception::Exception<>(int, FormatStringHelperImpl<>) @ 0x0000000104cc5178
4. DB::(anonymous namespace)::FunctionCast::prepareUnpackDictionaries(std::shared_ptr<DB::IDataType const> const&, std::shared_ptr<DB::IDataType const> const&) const @ 0x000000010b9cb094
5. DB::(anonymous namespace)::FunctionCast::prepare(std::vector<DB::ColumnWithTypeAndName, std::allocator<DB::ColumnWithTypeAndName>> const&) const @ 0x000000010b9c89e8
6. DB::QueryAnalyzer::resolveFunction(std::shared_ptr<DB::IQueryTreeNode>&, DB::IdentifierResolveScope&) @ 0x000000010c68e090
7. DB::QueryAnalyzer::resolveExpressionNode(std::shared_ptr<DB::IQueryTreeNode>&, DB::IdentifierResolveScope&, bool, bool, bool) @ 0x000000010c679854
8. DB::QueryAnalyzer::resolveExpressionNodeList(std::shared_ptr<DB::IQueryTreeNode>&, DB::IdentifierResolveScope&, bool, bool) @ 0x000000010c678dd8
9. DB::QueryAnalyzer::resolveProjectionExpressionNodeList(std::shared_ptr<DB::IQueryTreeNode>&, DB::IdentifierResolveScope&) @ 0x000000010c698918
10. DB::QueryAnalyzer::resolveQuery(std::shared_ptr<DB::IQueryTreeNode> const&, DB::IdentifierResolveScope&) @ 0x000000010c674a04
11. DB::QueryAnalyzer::resolveUnion(std::shared_ptr<DB::IQueryTreeNode> const&, DB::IdentifierResolveScope&) @ 0x000000010c676848
12. DB::QueryAnalyzer::resolveExpressionNode(std::shared_ptr<DB::IQueryTreeNode>&, DB::IdentifierResolveScope&, bool, bool, bool) @ 0x000000010c679ae4
13. DB::QueryAnalyzer::resolveQueryJoinTreeNode(std::shared_ptr<DB::IQueryTreeNode>&, DB::IdentifierResolveScope&, DB::QueryExpressionsAliasVisitor&) @ 0x000000010c69a694
14. DB::QueryAnalyzer::resolveQuery(std::shared_ptr<DB::IQueryTreeNode> const&, DB::IdentifierResolveScope&) @ 0x000000010c6749c0
15. DB::QueryAnalyzer::resolveExpressionNode(std::shared_ptr<DB::IQueryTreeNode>&, DB::IdentifierResolveScope&, bool, bool, bool) @ 0x000000010c679764
16. DB::QueryAnalyzer::resolveQueryJoinTreeNode(std::shared_ptr<DB::IQueryTreeNode>&, DB::IdentifierResolveScope&, DB::QueryExpressionsAliasVisitor&) @ 0x000000010c69a694
17. DB::QueryAnalyzer::resolveQuery(std::shared_ptr<DB::IQueryTreeNode> const&, DB::IdentifierResolveScope&) @ 0x000000010c6749c0
18. DB::QueryAnalyzer::resolve(std::shared_ptr<DB::IQueryTreeNode>&, std::shared_ptr<DB::IQueryTreeNode> const&, std::shared_ptr<DB::Context const>) @ 0x000000010c673f88
19. DB::QueryAnalysisPass::run(std::shared_ptr<DB::IQueryTreeNode>&, std::shared_ptr<DB::Context const>) @ 0x000000010c6738a0
20. DB::QueryTreePassManager::run(std::shared_ptr<DB::IQueryTreeNode>) @ 0x000000010c5f47fc
21. DB::(anonymous namespace)::buildQueryTreeAndRunPasses(std::shared_ptr<DB::IAST> const&, DB::SelectQueryOptions const&, std::shared_ptr<DB::Context const> const&, std::shared_ptr<DB::IStorage> const&) @ 0x000000010cd88d40
22. DB::InterpreterSelectQueryAnalyzer::InterpreterSelectQueryAnalyzer(std::shared_ptr<DB::IAST> const&, std::shared_ptr<DB::Context const> const&, DB::SelectQueryOptions const&, std::vector<String, std::allocator<String>> const&) @ 0x000000010cd86e58
23. DB::InterpreterSelectQueryAnalyzer::getSampleBlock(std::shared_ptr<DB::IAST> const&, std::shared_ptr<DB::Context const> const&, DB::SelectQueryOptions const&) @ 0x000000010cd89160
24. DB::InterpreterCreateQuery::getTablePropertiesAndNormalizeCreateQuery(DB::ASTCreateQuery&, DB::LoadingStrictnessLevel) const @ 0x000000010ccedbec
25. DB::InterpreterCreateQuery::createTable(DB::ASTCreateQuery&) @ 0x000000010ccf2adc
26. DB::InterpreterCreateQuery::execute() @ 0x000000010ccfc658
27. DB::executeQueryImpl(char const*, char const*, std::shared_ptr<DB::Context>, DB::QueryFlags, DB::QueryProcessingStage::Enum, DB::ReadBuffer*) @ 0x000000010d0691b0
28. DB::executeQuery(String const&, std::shared_ptr<DB::Context>, DB::QueryFlags, DB::QueryProcessingStage::Enum) @ 0x000000010d065fd4
29. DB::TCPHandler::runImpl() @ 0x000000010dd52f00
30. DB::TCPHandler::run() @ 0x000000010dd68268
31. Poco::Net::TCPServerConnection::start() @ 0x000000010f012378
11:10:03
11:10:03 Done. PASS=1 WARN=0 ERROR=1 SKIP=1 TOTAL=3
Environment
dbt version: 1.8.3
dbt-clickhouse version: 1.8.0
clickhouse-driver version (if using native): 0.2.8
clickhouse-connect version (if using http): 0.7.15
Python version: I've observed this behaviour with 3.10 and 3.12
Describe the bug
When using Clickhouse it is common practice to not allows columns to be nullable so as to improve the efficiency of analytic-type queries. However, this is not compatible with how unit tests in DBT handle missing values and so results in an error such as the following:
I originally raised this in
dbt-core
(here) and was directed here. One of the maintainers suggests where the root cause may be and suggested a fix in this comment:Steps to reproduce
Consider the following example, based on the output of the
dbt init
command:Values for the column
foo
are omitted in the tests, with only theid
values provided. Runningdbt build
ordbt test
results in the error.Expected behaviour
If a column is not-nullable and a value not provided in the test input, the values used should be the appropriate non-null default. For example, Clickhouse would typically use 0 instead of
NULL
for a non-nullable numeric column.dbt and/or ClickHouse server logs
Running
dbt build
on the above example results in the following stacktrace:Environment
ClickHouse server