Closed Somtom closed 6 months ago
I did some further investigation and found the following:
When
EXCHANGE TABLES `dap_test`.`my_test_model_base__dbt_backup` AND `dap_test`.`my_test_model_base` ON CLUSTER "company_cluster";
is run, the tables seem to be switched based on their definition:
The distributed table also still points to the my_test_model_base
model as before (but we expect this one to be updated).
However, if I run SELECT * FROM dap_test.my_test_model_base
I also get only 1 column:
Same for SELECT * FROM dap_test.my_test_model
:
However, if I run SELECT * FROM dap_test.my_test_model_base__dbt_backup
, I get the 2 columns:
When I redefine the my_test_model table after the EXCHANGE TABLES
step with the following statement, I get the correct columns:
create or replace table `dap_test`.`my_test_model`
ON CLUSTER "prod" as `dap_test`.`my_test_model_base`
ENGINE = Distributed('prod', 'dap_test', 'my_test_model_base', rand());
@genzgd I created a small PR with a change that fixed it for me locally: https://github.com/ClickHouse/dbt-clickhouse/pull/230 Maybe that can help figuring out a solution. I am happy to help but I have hardly any context on dbt adapters.
When updating local tables underneath a distributed table, one should drop and refresh the distributed table in ClickHouse. Distributed table support was only added recently to this adapter and at least used to be experimental. Is there any step you found that would drop and recreate the Distributed table @Somtom?
Distributed tables don't store any data, and those that point at the schema of a local table need to be refreshed like you just did when the underlying local table is updated. However, there isn't really a problem replacing the distributed tables each time.
I guess a dbt full refresh of the table would also fix the issue (assuming it's not incremental). And if I am right dropping the distributed table in a pre-hook should also work?
@emirkmo unfortunately I did not look into dropping and recreating the distributed table. Currently, I also don't work with the setup anymore - so it is hard to look into it for me .
This appears to be fixed by #230 without the need to drop and recreate the distributed tables, but I could see it being reopened. Unfortunately we don't currently have the resources to fully investigate issues with community contributed experimental features.
Describe the bug
When I try to add columns to a model in a distributed setup using
ReplicatedMergeTree
, I am not able to add new columns. Running the model for the first time succeeds. Running it again also works. Adding new columns and running it, produces the following error:Steps to reproduce
I am using a local clickhouse-cluster setup from this repo and changed the Clickhouse server version to 23.8.4.
dbt_project.yml
profiles.yml
dbt-clickhouse
driverdbt init
commanddbt run
dbt run
Expected behaviour
It should be possible to modify the data models
Code examples, such as models or profile settings
For the code, see the reproduceable example above.
dbt and/or ClickHouse server logs
Configuration
Environment
ClickHouse server
See the cluster setup information on top