ClickHouse / ClickHouse

ClickHouse® is a real-time analytics DBMS
https://clickhouse.com
Apache License 2.0
37.09k stars 6.84k forks source link

Why not skip the rename proccess, continue fetch broken parts from replicas to other local disks when a disk is lost #27938

Open ditgittube opened 3 years ago

ditgittube commented 3 years ago

when disk is lost, clickhouse rename part to detached directory, but throw the excecption:

2021.08.21 16:13:48.967419 [ 27384 ] {} <Error> default.customer (ReplicatedMergeTreePartCheckThread): void DB::ReplicatedMergeTreePartCheckThread::run(): Code: 107, e.displayText() = DB::Exception: Part directory /home/omm/clickhouse/data2/clickhouse/store/83c/83c8ba1e-d9ba-4c91-b96d-d54fa666a74a/all_1_1_0/ doesn't exist. Most likely it is a logical error., Stack trace (when copying this message, always include the lines below):

0. Poco::Exception::Exception(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, int) @ 0xd7a1ef0 in /home/omm/clickhouse/usr/bin/clickhouse
1. DB::Exception::Exception(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, int, bool) @ 0x4847654 in /home/omm/clickhouse/usr/bin/clickhouse
2. DB::IMergeTreeDataPart::renameTo(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, bool) const (.cold) @ 0x43027e5 in /home/omm/clickhouse/usr/bin/clickhouse
3. DB::IMergeTreeDataPart::renameToDetached(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&) const @ 0xa98250b in /home/omm/clickhouse/usr/bin/clickhouse
4. DB::MergeTreeData::forgetPartAndMoveToDetached(std::__1::shared_ptr<DB::IMergeTreeDataPart const> const&, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, bool) @ 0xa9c288f in /home/omm/clickhouse/usr/bin/clickhouse
5. DB::ReplicatedMergeTreePartCheckThread::checkPart(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&) (.cold) @ 0x433e635 in /home/omm/clickhouse/usr/bin/clickhouse
6. DB::ReplicatedMergeTreePartCheckThread::run() @ 0xaadd587 in /home/omm/clickhouse/usr/bin/clickhouse
7. DB::BackgroundSchedulePoolTaskInfo::execute() @ 0xae2e56e in /home/omm/clickhouse/usr/bin/clickhouse
8. DB::BackgroundSchedulePool::threadFunction() @ 0xae2ea07 in /home/omm/clickhouse/usr/bin/clickhouse
9. ThreadFromGlobalPool::ThreadFromGlobalPool<DB::BackgroundSchedulePool::BackgroundSchedulePool(unsigned long, unsigned long, char const*)::'lambda'()>(DB::BackgroundSchedulePool::BackgroundSchedulePool(unsigned long, unsigned long, char const*)::'lambda'()&&)::'lambda'()::operator()() @ 0xae2eb46 in /home/omm/clickhouse/usr/bin/clickhouse
10. ThreadPoolImpl<std::__1::thread>::worker(std::__1::__list_iterator<std::__1::thread, void*>) @ 0x48a6f53 in /home/omm/clickhouse/usr/bin/clickhouse
11. void* std::__1::__thread_proxy<std::__1::tuple<std::__1::unique_ptr<std::__1::__thread_struct, std::__1::default_delete<std::__1::__thread_struct> >, void ThreadPoolImpl<std::__1::thread>::scheduleImpl<void>(std::__1::function<void ()>, int, std::__1::optional<unsigned long>)::'lambda1'()> >(void*) @ 0x48a681f in /home/omm/clickhouse/usr/bin/clickhouse
12. start_thread @ 0x7e15 in /usr/lib64/libpthread-2.17.so
13. __clone @ 0x101fed in /usr/lib64/libc-2.17.so
 (version 21.3.4.25)

Why not skip the rename proccess, continue fetch broken parts from replicas to other local disks?

alexey-milovidov commented 3 years ago

Yes, if part doesn't exist, it is pointless to attempt renaming.