Closed chipzzz closed 1 month ago
It sounds like after running add-node
I should run the regular snuba migrations migrate --force
on the additional nodes to get necessary migration job steps completed. Hopefully it will let me and not say that a migration has already been completed for that group
Without performing any additional actions in both scenarios timeseries_id
and _error_ids_hashed
were later found in their respective tables on the new node. How does that happen?
At first I thought due to the exception, the tables were not created but in fact they are! And checking their fields, they look normal.
Correction** some tables are in fact missing
Rerunning on one of the missing nodes for the missing table in this case transactions_local
still complains about missing fields
snuba@sentry-snuba-api-59554c5fbc-z9ks6:/usr/src/snuba$ snuba migrations add-node --type local --storage-set transactions --host-name clickhouse-shard1-1.clickhouse-headless.sentry-dev.svc.cluster.local --port 9000 -
-database default
2024-08-08 22:53:34,733 Initializing Snuba...
2024-08-08 22:53:36,318 Snuba initialization took 1.5789565416052938s
{"module": "snuba.migrations.operations", "event": "Executing CREATE TABLE IF NOT EXISTS transactions_local (project_id UInt64, event_id UUID, trace_id UUID, span_id UInt64, transaction_name LowCardinality(String), transaction_hash UInt64 MATERIALIZED cityHash64(transaction_name), transaction_op LowCardinality(String), transaction_status UInt8 DEFAULT 2, start_ts DateTime, start_ms UInt16, finish_ts DateTime, finish_ms UInt16, duration UInt32, platform LowCardinality(String), environment LowCardinality(Nullable(String)), release LowCardinality(Nullable(String)), dist LowCardinality(Nullable(String)), sdk_name LowCardinality(String) DEFAULT '', sdk_version LowCardinality(String) DEFAULT '', ip_address_v4 Nullable(IPv4), ip_address_v6 Nullable(IPv6), user String DEFAULT '', user_hash UInt64 MATERIALIZED cityHash64(user), user_id Nullable(String), user_name Nullable(String), user_email Nullable(String), tags Nested(key String, value String), _tags_flattened String, contexts Nested(key String, value String), _contexts_flattened String, partition UInt16, offset UInt64, message_timestamp DateTime, retention_days UInt16, deleted UInt8) ENGINE ReplicatedReplacingMergeTree('/clickhouse/tables/transactions/{shard}/default/transactions_local', '{replica}', deleted) ORDER BY (project_id, toStartOfDay(finish_ts), transaction_name, cityHash64(span_id)) PARTITION BY (retention_days, toMonday(finish_ts)) SAMPLE BY cityHash64(span_id) TTL finish_ts + toIntervalDay(retention_days) SETTINGS index_granularity=8192;", "severity": "info", "timestamp": "2024-08-08T22:53:36.506090Z"}
Traceback (most recent call last):
File "/usr/src/snuba/snuba/clickhouse/native.py", line 200, in execute
result_data = query_execute()
^^^^^^^^^^^^^^^
File "/usr/src/snuba/snuba/clickhouse/native.py", line 183, in query_execute
return conn.execute( # type: ignore
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/site-packages/clickhouse_driver/client.py", line 373, in execute
rv = self.process_ordinary_query(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/site-packages/clickhouse_driver/client.py", line 571, in process_ordinary_query
return self.receive_result(with_column_types=with_column_types,
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/site-packages/clickhouse_driver/client.py", line 204, in receive_result
return result.get_result()
^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/site-packages/clickhouse_driver/result.py", line 50, in get_result
for packet in self.packet_generator:
File "/usr/local/lib/python3.11/site-packages/clickhouse_driver/client.py", line 220, in packet_generator
packet = self.receive_packet()
^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/site-packages/clickhouse_driver/client.py", line 237, in receive_packet
raise packet.exception
clickhouse_driver.errors.ServerException: Code: 47.
DB::Exception: Missing columns: 'timestamp' while processing query: 'timestamp', required columns: 'timestamp' 'timestamp'. Stack trace:
0. DB::Exception::Exception(DB::Exception::MessageMasked&&, int, bool) @ 0xe3b83d5 in /opt/bitnami/clickhouse/bin/clickhouse
1. ? @ 0x8b5f18c in /opt/bitnami/clickhouse/bin/clickhouse
2. DB::TreeRewriterResult::collectUsedColumns(std::shared_ptr<DB::IAST> const&, bool, bool) @ 0x13e5eb79 in /opt/bitnami/clickhouse/bin/clickhouse
3. DB::TreeRewriter::analyze(std::shared_ptr<DB::IAST>&, DB::NamesAndTypesList const&, std::shared_ptr<DB::IStorage const>, std::shared_ptr<DB::StorageSnapshot> const&, bool, bool, bool, bool) const @ 0x13e67b98 in /opt/bitnami/clickhouse/bin/clickhouse
4. DB::IndexDescription::getIndexFromAST(std::shared_ptr<DB::IAST> const&, DB::ColumnsDescription const&, std::shared_ptr<DB::Context const>) @ 0x14234414 in /opt/bitnami/clickhouse/bin/clickhouse
5. DB::IndicesDescription::parse(String const&, DB::ColumnsDescription const&, std::shared_ptr<DB::Context const>) @ 0x142354c1 in /opt/bitnami/clickhouse/bin/clickhouse
6. DB::ReplicatedMergeTreeTableMetadata::checkEquals(DB::ReplicatedMergeTreeTableMetadata const&, DB::ColumnsDescription const&, std::shared_ptr<DB::Context const>) const @ 0x14b329ca in /opt/bitnami/clickhouse/bin/clickhouse
7. DB::StorageReplicatedMergeTree::checkTableStructure(String const&, std::shared_ptr<DB::StorageInMemoryMetadata const> const&) @ 0x143643a9 in /opt/bitnami/clickhouse/bin/clickhouse
8. DB::StorageReplicatedMergeTree::StorageReplicatedMergeTree(String const&, String const&, bool, DB::StorageID const&, String const&, DB::StorageInMemoryMetadata const&, std::shared_ptr<DB::Context>, String const&, DB::MergeTreeData::MergingParams const&, std::unique_ptr<DB::MergeTreeSettings, std::default_delete<DB::MergeTreeSettings>>, bool, DB::RenamingRestrictions) @ 0x1435a8d5 in /opt/bitnami/clickhouse/bin/clickhouse
9. ? @ 0x14b44142 in /opt/bitnami/clickhouse/bin/clickhouse
10. DB::StorageFactory::get(DB::ASTCreateQuery const&, String const&, std::shared_ptr<DB::Context>, std::shared_ptr<DB::Context>, DB::ColumnsDescription const&, DB::ConstraintsDescription const&, bool) const @ 0x1429205b in /opt/bitnami/clickhouse/bin/clickhouse
11. DB::InterpreterCreateQuery::doCreateTable(DB::ASTCreateQuery&, DB::InterpreterCreateQuery::TableProperties const&, std::unique_ptr<DB::DDLGuard, std::default_delete<DB::DDLGuard>>&) @ 0x139777d1 in /opt/bitnami/clickhouse/bin/clickhouse
12. DB::InterpreterCreateQuery::createTable(DB::ASTCreateQuery&) @ 0x13970542 in /opt/bitnami/clickhouse/bin/clickhouse
13. DB::InterpreterCreateQuery::execute() @ 0x1397ccb4 in /opt/bitnami/clickhouse/bin/clickhouse
14. ? @ 0x13efde87 in /opt/bitnami/clickhouse/bin/clickhouse
15. DB::executeQuery(String const&, std::shared_ptr<DB::Context>, bool, DB::QueryProcessingStage::Enum) @ 0x13efb40d in /opt/bitnami/clickhouse/bin/clickhouse
16. DB::TCPHandler::runImpl() @ 0x14cd1c11 in /opt/bitnami/clickhouse/bin/clickhouse
17. DB::TCPHandler::run() @ 0x14ce7719 in /opt/bitnami/clickhouse/bin/clickhouse
18. Poco::Net::TCPServerConnection::start() @ 0x17c4ef54 in /opt/bitnami/clickhouse/bin/clickhouse
19. Poco::Net::TCPServerDispatcher::run() @ 0x17c5017b in /opt/bitnami/clickhouse/bin/clickhouse
20. Poco::PooledThread::run() @ 0x17dcd527 in /opt/bitnami/clickhouse/bin/clickhouse
21. Poco::ThreadImpl::runnableEntry(void*) @ 0x17dcaf5d in /opt/bitnami/clickhouse/bin/clickhouse
22. start_thread @ 0x7ea7 in /lib/x86_64-linux-gnu/libpthread-2.31.so
23. __clone @ 0xfca2f in /lib/x86_64-linux-gnu/libc-2.31.so
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/usr/local/bin/snuba", line 33, in <module>
sys.exit(load_entry_point('snuba', 'console_scripts', 'snuba')())
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/site-packages/click/core.py", line 1130, in __call__
return self.main(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/site-packages/click/core.py", line 1055, in main
rv = self.invoke(ctx)
^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/site-packages/click/core.py", line 1657, in invoke
return _process_result(sub_ctx.command.invoke(sub_ctx))
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/site-packages/click/core.py", line 1657, in invoke
return _process_result(sub_ctx.command.invoke(sub_ctx))
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/site-packages/click/core.py", line 1404, in invoke
return ctx.invoke(self.callback, **ctx.params)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/site-packages/click/core.py", line 760, in invoke
return __callback(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/src/snuba/snuba/cli/migrations.py", line 372, in add_node
Runner.add_node(
File "/usr/src/snuba/snuba/migrations/runner.py", line 548, in add_node
op.execute_new_node(storage_sets, node_type, clickhouse)
File "/usr/src/snuba/snuba/migrations/operations.py", line 595, in execute_new_node
clickhouse.execute(sql)
File "/usr/src/snuba/snuba/clickhouse/native.py", line 283, in execute
raise ClickhouseError(e.message, code=e.code) from e
snuba.clickhouse.errors.ClickhouseError: DB::Exception: Missing columns: 'timestamp' while processing query: 'timestamp', required columns: 'timestamp' 'timestamp'. Stack trace:
0. DB::Exception::Exception(DB::Exception::MessageMasked&&, int, bool) @ 0xe3b83d5 in /opt/bitnami/clickhouse/bin/clickhouse
1. ? @ 0x8b5f18c in /opt/bitnami/clickhouse/bin/clickhouse
2. DB::TreeRewriterResult::collectUsedColumns(std::shared_ptr<DB::IAST> const&, bool, bool) @ 0x13e5eb79 in /opt/bitnami/clickhouse/bin/clickhouse
3. DB::TreeRewriter::analyze(std::shared_ptr<DB::IAST>&, DB::NamesAndTypesList const&, std::shared_ptr<DB::IStorage const>, std::shared_ptr<DB::StorageSnapshot> const&, bool, bool, bool, bool) const @ 0x13e67b98 in /opt/bitnami/clickhouse/bin/clickhouse
4. DB::IndexDescription::getIndexFromAST(std::shared_ptr<DB::IAST> const&, DB::ColumnsDescription const&, std::shared_ptr<DB::Context const>) @ 0x14234414 in /opt/bitnami/clickhouse/bin/clickhouse
5. DB::IndicesDescription::parse(String const&, DB::ColumnsDescription const&, std::shared_ptr<DB::Context const>) @ 0x142354c1 in /opt/bitnami/clickhouse/bin/clickhouse
6. DB::ReplicatedMergeTreeTableMetadata::checkEquals(DB::ReplicatedMergeTreeTableMetadata const&, DB::ColumnsDescription const&, std::shared_ptr<DB::Context const>) const @ 0x14b329ca in /opt/bitnami/clickhouse/bin/clickhouse
7. DB::StorageReplicatedMergeTree::checkTableStructure(String const&, std::shared_ptr<DB::StorageInMemoryMetadata const> const&) @ 0x143643a9 in /opt/bitnami/clickhouse/bin/clickhouse
8. DB::StorageReplicatedMergeTree::StorageReplicatedMergeTree(String const&, String const&, bool, DB::StorageID const&, String const&, DB::StorageInMemoryMetadata const&, std::shared_ptr<DB::Context>, String const&, DB::MergeTreeData::MergingParams const&, std::unique_ptr<DB::MergeTreeSettings, std::default_delete<DB::MergeTreeSettings>>, bool, DB::RenamingRestrictions) @ 0x1435a8d5 in /opt/bitnami/clickhouse/bin/clickhouse
9. ? @ 0x14b44142 in /opt/bitnami/clickhouse/bin/clickhouse
10. DB::StorageFactory::get(DB::ASTCreateQuery const&, String const&, std::shared_ptr<DB::Context>, std::shared_ptr<DB::Context>, DB::ColumnsDescription const&, DB::ConstraintsDescription const&, bool) const @ 0x1429205b in /opt/bitnami/clickhouse/bin/clickhouse
11. DB::InterpreterCreateQuery::doCreateTable(DB::ASTCreateQuery&, DB::InterpreterCreateQuery::TableProperties const&, std::unique_ptr<DB::DDLGuard, std::default_delete<DB::DDLGuard>>&) @ 0x139777d1 in /opt/bitnami/clickhouse/bin/clickhouse
12. DB::InterpreterCreateQuery::createTable(DB::ASTCreateQuery&) @ 0x13970542 in /opt/bitnami/clickhouse/bin/clickhouse
13. DB::InterpreterCreateQuery::execute() @ 0x1397ccb4 in /opt/bitnami/clickhouse/bin/clickhouse
14. ? @ 0x13efde87 in /opt/bitnami/clickhouse/bin/clickhouse
15. DB::executeQuery(String const&, std::shared_ptr<DB::Context>, bool, DB::QueryProcessingStage::Enum) @ 0x13efb40d in /opt/bitnami/clickhouse/bin/clickhouse
16. DB::TCPHandler::runImpl() @ 0x14cd1c11 in /opt/bitnami/clickhouse/bin/clickhouse
17. DB::TCPHandler::run() @ 0x14ce7719 in /opt/bitnami/clickhouse/bin/clickhouse
18. Poco::Net::TCPServerConnection::start() @ 0x17c4ef54 in /opt/bitnami/clickhouse/bin/clickhouse
19. Poco::Net::TCPServerDispatcher::run() @ 0x17c5017b in /opt/bitnami/clickhouse/bin/clickhouse
20. Poco::PooledThread::run() @ 0x17dcd527 in /opt/bitnami/clickhouse/bin/clickhouse
21. Poco::ThreadImpl::runnableEntry(void*) @ 0x17dcaf5d in /opt/bitnami/clickhouse/bin/clickhouse
22. start_thread @ 0x7ea7 in /lib/x86_64-linux-gnu/libpthread-2.31.so
23. __clone @ 0xfca2f in /lib/x86_64-linux-gnu/libc-2.31.so
So as per my comment here and per the help menu from add-node
All of the SQL operations for the provided storage sets will be run. Any non
SQL (Python) operations will be skipped.
How do I rerun the "python migrations" that have additional steps that may have been missed by add-node
? If I do something like
CLICKHOUSE_HOST=clickhouse-shard1-1.clickhouse-headless.sentry-dev.svc.cluster.local snuba migrations migrate -g transactions --force
I get
Finished running migrations
Do I have reverse all migrations for transactions group and rerun them all on the problematic node? Is there a way to reverse an entire group migration rather than a single migration step?
... Sure I can just add the missing timestamp to the table but I want to be holistic and run all the missing steps in the "python migration operations"
hi @chipzzz
Are you running the migration commands on the new node before you are adding it to the cluster? Because add-node
basically runs all the migrations in order, the initial create table statement won't match the current state of the other tables.
An alternative to creating tables on new nodes with add-node
is to copy the create table statements from a replica and then run that one the new node. I don't think we have documentation for this and it's been a while since I verified this but we did add a copy tables script https://github.com/getsentry/snuba/blob/master/scripts/copy_tables.py that was supposed to help with this. I don't believe this will work for adding a new shard though since the zookeeper has to change for the new shard.
Let me know if I missed something from your comments, and I'll take another look.
@MeredithAnya , that's 1 thing I didn't try - deploying node without connecting it to the cluster and running the commands. I will have to revisit this. However, I think would run into more issues rebalancing clickhouse https://clickhouse.com/docs/en/guides/sre/scaling-clusters. That's a whole another problem.
For now I opted out of shards(causing lots of problems during migrations) and just using replicas. When I find the need to scale I'll revisit this. But I agree, replicating & sharding clickhouse alongside sentry is a mess and not well documented.
@chipzzz feel free to open a PR on the documentation change. Closing this issue now.
When running something
add-node
operations to create tables in additional clickhouse nodes(shards) I get local table v. zookeeper table mismatch where zookeeper has additional fields i.e like thetimeseries_id
. Another case is that the schema has missing columns i.e_error_ids_hashed
.Question I have is, when running
add-node
do I need to run in local mode if I run snuba in distributed mode? I figured I run both since both_local
and_dist
tables are present on the original node, but do I need need local tables on additional nodes? How does table replication happen across additional nodes.Ref: https://github.com/getsentry/snuba/blob/master/MIGRATIONS.md
Error 1 - CH : Zookeper mismatch - happens for multiple tables
Error 2 - Missing required schema fields - happens for multiple tables