ClickHouse / ClickHouse

ClickHouse® is a real-time analytics DBMS
https://clickhouse.com
Apache License 2.0
35.63k stars 6.66k forks source link

Clickhouse setup keep crash with mutationsFinalizingTask Exception, how to by pass it? #66115

Open kazhou2024 opened 2 weeks ago

kazhou2024 commented 2 weeks ago

Company or project name

No response

Question

Hi CH crew, I found that one replica keep crashing during setup, and there are many mutationsFinalizingTask for different tables, event there is only ONE record in system.mutations table? What is the CH setup process, is there any setting to bypass the mutation execution? As we found that one table is no data/part/mutations, but still get the StorageReplicatedMergeTree::mutationsFinalizingTask error. Seems even no mutation, the CH still need to load the mutation, check execution and then update to ZK per each table? Could we optimize it, as the ZK may not stable?

2024.07.04 21:31:13.194521 [ 463 ] {} (0d503ce9-043b-44ae-b305-3143a61d5c7b): void DB::StorageReplicatedMergeTree::mutationsFinalizingTask(): Code: 242. DB::Exception: Table is in readonly mode (replica path: /clickhouse/xxx/xxxx/tables/10/xxx/replicas/xx-xx-xx-10-01719836866). (TABLE_IS_READ_ONLY), Stack trace (when copying this message, always include the lines below):

  1. DB::Exception::Exception(DB::Exception::MessageMasked&&, int, bool) @ 0x000000000c5c4fbb
  2. DB::Exception::Exception(PreformattedMessage&&, int) @ 0x00000000077960ec
  3. DB::Exception::Exception<String const&>(int, FormatStringHelperImpl<std::type_identity<String const&>::type>, String const&) @ 0x00000000077b068b
  4. DB::StorageReplicatedMergeTree::assertNotReadonly() const @ 0x00000000113c3921
  5. void std::function::policy_invoker<void ()>::__call_impl<std::function::default_alloc_func<DB::StorageReplicatedMergeTree::StorageReplicatedMergeTree(String const&, String const&, DB::LoadingStrictnessLevel, DB::StorageID const&, String const&, DB::StorageInMemoryMetadata const&, std::shared_ptr, String const&, DB::MergeTreeData::MergingParams const&, std::unique_ptr<DB::MergeTreeSettings, std::default_delete>, DB::RenamingRestrictions, bool)::$_4, void ()>>(std::function::policy_storage const*) @ 0x000000001152658c
  6. DB::BackgroundSchedulePool::threadFunction() @ 0x000000000f9a7a40
  7. void std::function::policy_invoker<void ()>::__call_impl<std::function::default_alloc_func<ThreadFromGlobalPoolImpl<false, true>::ThreadFromGlobalPoolImpl<DB::BackgroundSchedulePool::BackgroundSchedulePool(unsigned long, StrongTypedef<unsigned long, CurrentMetrics::MetricTag>, StrongTypedef<unsigned long, CurrentMetrics::MetricTag>, char const)::$_0>(DB::BackgroundSchedulePool::BackgroundSchedulePool(unsigned long, StrongTypedef<unsigned long, CurrentMetrics::MetricTag>, StrongTypedef<unsigned long, CurrentMetrics::MetricTag>, char const)::$_0&&)::'lambda'(), void ()>>(std::function::policy_storage const*) @ 0x000000000f9a8ae7
  8. void std::thread_proxy[abi:v15000]<std::tuple<std::unique_ptr<std::thread_struct, std::default_delete>, void ThreadPoolImpl::scheduleImpl(std::function<void ()>, Priority, std::optional, bool)::'lambda0'()>>(void) @ 0x000000000c676f83
  9. ? @ 0x000072d916a2a609
  10. ? @ 0x000072d916945353 (version 24.5.1.1763 (official build))
CurtizJ commented 2 weeks ago

Does the table really fail to startup? I think replica should start as readonly in that case.

kazhou2024 commented 1 week ago

Does the table really fail to startup? I think replica should start as readonly in that case.

Yes, we have 3*16 replicas, there is one replica was keeping crashing for 6 hours with above error as we have 200+ tables, and finally succeed once no ZK error.