cigarl commented 3 years ago

Description

When my environment(3 nodes,3 replicas) has network fluctuations, or a node is overloaded and responds slowly, after it rejoined the cluster,i find that CatchupTask may cause my cluster to be corrupted. So I did some analysis and found the following. Please correct me if there is anything wrong.

Question

CatchupTask does not control the size of data on a single slot. In another word,it can result in too many schema or files on a slot (like slot[981] and slot[911]),this could be a heavy operation. Besides, since we limit the maximum size of thrift frame to 512 MB,that means the request can not be sent to another node successfully.

2021-11-05 17:39:13,148 [DataClientThread-133] INFO  o.a.i.c.s.m.DataGroupMember:396 - Data(x.x.x.x:9003, raftId=0): received a snapshot from RaftNode(node:Node(internalIp:x.x.x.x, metaPort:9003, nodeIdentifier:-206505346, dataPort:40010, clientPort:6667, clientIp:x.x.x.x), raftId:0) with size 285120061 
2021-11-05 17:39:15,093 [DataClientThread-133] INFO  o.a.i.c.l.s.PartitionedSnapshot$Installer:165 - Data(x.x.x.x:9003, raftId=0): start to install a snapshot of 3948175-98 
2021-11-05 17:39:15,098 [DataClientThread-133] INFO  o.a.i.c.l.s.FileSnapshot$Installer:200 - Starting to install a snapshot FileSnapshot{0 files, 9 series, index-term: 0-0} into slot[93] 
2021-11-05 17:39:15,098 [DataClientThread-133] INFO  o.a.i.c.l.s.FileSnapshot$Installer:202 - Schemas in snapshot are registered 
2021-11-05 17:39:15,110 [DataClientThread-133] INFO  o.a.i.c.l.s.FileSnapshot$Installer:304 - Data(x.x.x.x:9003, raftId=0): slot 93 is ready 
2021-11-05 17:39:15,110 [DataClientThread-133] INFO  o.a.i.c.l.s.FileSnapshot$Installer:200 - Starting to install a snapshot FileSnapshot{0 files, 4 series, index-term: 0-0} into slot[121] 
2021-11-05 17:39:15,110 [DataClientThread-133] INFO  o.a.i.c.l.s.FileSnapshot$Installer:202 - Schemas in snapshot are registered 
2021-11-05 17:39:15,116 [DataClientThread-133] INFO  o.a.i.c.l.s.FileSnapshot$Installer:304 - Data(x.x.x.x:9003, raftId=0): slot 121 is ready 
2021-11-05 17:39:15,116 [DataClientThread-133] INFO  o.a.i.c.l.s.FileSnapshot$Installer:200 - Starting to install a snapshot FileSnapshot{0 files, 16 series, index-term: 0-0} into slot[153] 
2021-11-05 17:39:15,117 [DataClientThread-133] INFO  o.a.i.c.l.s.FileSnapshot$Installer:202 - Schemas in snapshot are registered 
2021-11-05 17:39:15,120 [DataClientThread-133] INFO  o.a.i.c.l.s.FileSnapshot$Installer:304 - Data(x.x.x.x:9003, raftId=0): slot 153 is ready 
2021-11-05 17:39:15,121 [DataClientThread-133] INFO  o.a.i.c.l.s.FileSnapshot$Installer:200 - Starting to install a snapshot FileSnapshot{0 files, 4 series, index-term: 0-0} into slot[160] 
2021-11-05 17:39:15,121 [DataClientThread-133] INFO  o.a.i.c.l.s.FileSnapshot$Installer:202 - Schemas in snapshot are registered 
2021-11-05 17:39:15,124 [DataClientThread-133] INFO  o.a.i.c.l.s.FileSnapshot$Installer:304 - Data(x.x.x.x:9003, raftId=0): slot 160 is ready 
2021-11-05 17:39:15,125 [DataClientThread-133] INFO  o.a.i.c.l.s.FileSnapshot$Installer:200 - Starting to install a snapshot FileSnapshot{0 files, 660 series, index-term: 0-0} into slot[363] 
2021-11-05 17:39:15,139 [DataClientThread-133] INFO  o.a.i.c.l.s.FileSnapshot$Installer:202 - Schemas in snapshot are registered 
2021-11-05 17:39:15,143 [DataClientThread-133] INFO  o.a.i.c.l.s.FileSnapshot$Installer:304 - Data(x.x.x.x:9003, raftId=0): slot 363 is ready 
2021-11-05 17:39:15,143 [DataClientThread-133] INFO  o.a.i.c.l.s.FileSnapshot$Installer:200 - Starting to install a snapshot FileSnapshot{0 files, 1615 series, index-term: 0-0} into slot[366] 
2021-11-05 17:39:15,172 [DataClientThread-133] INFO  o.a.i.c.l.s.FileSnapshot$Installer:202 - Schemas in snapshot are registered 
2021-11-05 17:39:15,176 [DataClientThread-133] INFO  o.a.i.c.l.s.FileSnapshot$Installer:304 - Data(x.x.x.x:9003, raftId=0): slot 366 is ready 
2021-11-05 17:39:15,176 [DataClientThread-133] INFO  o.a.i.c.l.s.FileSnapshot$Installer:200 - Starting to install a snapshot FileSnapshot{0 files, 2768 series, index-term: 0-0} into slot[574] 
2021-11-05 17:39:15,227 [DataClientThread-133] INFO  o.a.i.c.l.s.FileSnapshot$Installer:202 - Schemas in snapshot are registered 
2021-11-05 17:39:15,229 [DataClientThread-133] INFO  o.a.i.c.l.s.FileSnapshot$Installer:304 - Data(x.x.x.x:9003, raftId=0): slot 574 is ready 
2021-11-05 17:39:15,230 [DataClientThread-133] INFO  o.a.i.c.l.s.FileSnapshot$Installer:200 - Starting to install a snapshot FileSnapshot{0 files, 4 series, index-term: 0-0} into slot[711] 
2021-11-05 17:39:15,230 [DataClientThread-133] INFO  o.a.i.c.l.s.FileSnapshot$Installer:202 - Schemas in snapshot are registered 
2021-11-05 17:39:15,232 [DataClientThread-133] INFO  o.a.i.c.l.s.FileSnapshot$Installer:304 - Data(x.x.x.x:9003, raftId=0): slot 711 is ready 
2021-11-05 17:39:15,232 [DataClientThread-133] INFO  o.a.i.c.l.s.FileSnapshot$Installer:200 - Starting to install a snapshot FileSnapshot{0 files, 309 series, index-term: 0-0} into slot[843] 
2021-11-05 17:39:15,239 [DataClientThread-133] INFO  o.a.i.c.l.s.FileSnapshot$Installer:202 - Schemas in snapshot are registered 
2021-11-05 17:39:15,240 [DataClientThread-133] INFO  o.a.i.c.l.s.FileSnapshot$Installer:304 - Data(x.x.x.x:9003, raftId=0): slot 843 is ready 
2021-11-05 17:39:15,241 [DataClientThread-133] INFO  o.a.i.c.l.s.FileSnapshot$Installer:200 - Starting to install a snapshot FileSnapshot{0 files, 23 series, index-term: 0-0} into slot[879] 
2021-11-05 17:39:15,241 [DataClientThread-133] INFO  o.a.i.c.l.s.FileSnapshot$Installer:202 - Schemas in snapshot are registered 
2021-11-05 17:39:15,243 [DataClientThread-133] INFO  o.a.i.c.l.s.FileSnapshot$Installer:304 - Data(x.x.x.x:9003, raftId=0): slot 879 is ready 
2021-11-05 17:39:15,243 [DataClientThread-133] INFO  o.a.i.c.l.s.FileSnapshot$Installer:200 - Starting to install a snapshot FileSnapshot{0 files, 300511 series, index-term: 0-0} into slot[911]
2021-11-05 17:39:20,456 [DataClientThread-133] INFO  o.a.i.c.l.s.FileSnapshot$Installer:202 - Schemas in snapshot are registered 
2021-11-05 17:39:20,458 [DataClientThread-133] INFO  o.a.i.c.l.s.FileSnapshot$Installer:304 - Data(x.x.x.x:9003, raftId=0): slot 911 is ready 
2021-11-05 17:39:20,459 [DataClientThread-133] INFO  o.a.i.c.l.s.FileSnapshot$Installer:200 - Starting to install a snapshot FileSnapshot{1 files, 1573281 series, index-term: 0-0} into slot[981]

When a slot is blocked in a request, the request fails and is retried repeatedly.

At the same time, as the operation increases, the request becomes larger and will never be successfully executed. These threads are taking up resources and the number of threads is increasing.

2021-11-05 17:39:15,098 [DataClientThread-133] INFO  o.a.i.c.l.s.FileSnapshot$Installer:200 - Starting to install a snapshot FileSnapshot{0 files, 9 series, index-term: 0-0} into slot[93]
2021-11-05 17:39:15,110 [DataClientThread-133] INFO  o.a.i.c.l.s.FileSnapshot$Installer:200 - Starting to install a snapshot FileSnapshot{0 files, 4 series, index-term: 0-0} into slot[121]
2021-11-05 17:39:15,116 [DataClientThread-133] INFO  o.a.i.c.l.s.FileSnapshot$Installer:200 - Starting to install a snapshot FileSnapshot{0 files, 16 series, index-term: 0-0} into slot[153]
2021-11-05 17:39:15,121 [DataClientThread-133] INFO  o.a.i.c.l.s.FileSnapshot$Installer:200 - Starting to install a snapshot FileSnapshot{0 files, 4 series, index-term: 0-0} into slot[160]
2021-11-05 17:39:15,125 [DataClientThread-133] INFO  o.a.i.c.l.s.FileSnapshot$Installer:200 - Starting to install a snapshot FileSnapshot{0 files, 660 series, index-term: 0-0} into slot[363]
2021-11-05 17:39:15,143 [DataClientThread-133] INFO  o.a.i.c.l.s.FileSnapshot$Installer:200 - Starting to install a snapshot FileSnapshot{0 files, 1615 series, index-term: 0-0} into slot[366]
2021-11-05 17:39:15,176 [DataClientThread-133] INFO  o.a.i.c.l.s.FileSnapshot$Installer:200 - Starting to install a snapshot FileSnapshot{0 files, 2768 series, index-term: 0-0} into slot[574]
2021-11-05 17:39:15,230 [DataClientThread-133] INFO  o.a.i.c.l.s.FileSnapshot$Installer:200 - Starting to install a snapshot FileSnapshot{0 files, 4 series, index-term: 0-0} into slot[711]
2021-11-05 17:39:15,232 [DataClientThread-133] INFO  o.a.i.c.l.s.FileSnapshot$Installer:200 - Starting to install a snapshot FileSnapshot{0 files, 309 series, index-term: 0-0} into slot[843]
2021-11-05 17:39:15,241 [DataClientThread-133] INFO  o.a.i.c.l.s.FileSnapshot$Installer:200 - Starting to install a snapshot FileSnapshot{0 files, 23 series, index-term: 0-0} into slot[879]
2021-11-05 17:39:15,243 [DataClientThread-133] INFO  o.a.i.c.l.s.FileSnapshot$Installer:200 - Starting to install a snapshot FileSnapshot{0 files, 300511 series, index-term: 0-0} into slot[911]
2021-11-05 17:39:20,459 [DataClientThread-133] INFO  o.a.i.c.l.s.FileSnapshot$Installer:200 - Starting to install a snapshot FileSnapshot{1 files, 1573281 series, index-term: 0-0} into slot[981]

2021-11-05 19:38:41,003 [DataClientThread-216] INFO  o.a.i.c.l.s.FileSnapshot$Installer:200 - Starting to install a snapshot FileSnapshot{0 files, 9 series, index-term: 0-0} into slot[93]
2021-11-05 19:38:41,006 [DataClientThread-216] INFO  o.a.i.c.l.s.FileSnapshot$Installer:200 - Starting to install a snapshot FileSnapshot{0 files, 4 series, index-term: 0-0} into slot[121]
2021-11-05 19:38:41,007 [DataClientThread-216] INFO  o.a.i.c.l.s.FileSnapshot$Installer:200 - Starting to install a snapshot FileSnapshot{0 files, 16 series, index-term: 0-0} into slot[153]
2021-11-05 19:38:41,010 [DataClientThread-216] INFO  o.a.i.c.l.s.FileSnapshot$Installer:200 - Starting to install a snapshot FileSnapshot{0 files, 4 series, index-term: 0-0} into slot[160]
2021-11-05 19:38:41,012 [DataClientThread-216] INFO  o.a.i.c.l.s.FileSnapshot$Installer:200 - Starting to install a snapshot FileSnapshot{0 files, 660 series, index-term: 0-0} into slot[363]
2021-11-05 19:38:41,095 [DataClientThread-216] INFO  o.a.i.c.l.s.FileSnapshot$Installer:200 - Starting to install a snapshot FileSnapshot{0 files, 1615 series, index-term: 0-0} into slot[366]
2021-11-05 19:38:41,300 [DataClientThread-216] INFO  o.a.i.c.l.s.FileSnapshot$Installer:200 - Starting to install a snapshot FileSnapshot{0 files, 2768 series, index-term: 0-0} into slot[574]
2021-11-05 19:38:41,684 [DataClientThread-216] INFO  o.a.i.c.l.s.FileSnapshot$Installer:200 - Starting to install a snapshot FileSnapshot{0 files, 4 series, index-term: 0-0} into slot[711]
2021-11-05 19:38:41,686 [DataClientThread-216] INFO  o.a.i.c.l.s.FileSnapshot$Installer:200 - Starting to install a snapshot FileSnapshot{0 files, 309 series, index-term: 0-0} into slot[843]
2021-11-05 19:38:41,726 [DataClientThread-216] INFO  o.a.i.c.l.s.FileSnapshot$Installer:200 - Starting to install a snapshot FileSnapshot{0 files, 23 series, index-term: 0-0} into slot[879]
2021-11-05 19:38:41,730 [DataClientThread-216] INFO  o.a.i.c.l.s.FileSnapshot$Installer:200 - Starting to install a snapshot FileSnapshot{0 files, 300511 series, index-term: 0-0} into slot[911]

2021-11-05 19:39:54,016 [DataClientThread-218] INFO  o.a.i.c.l.s.FileSnapshot$Installer:200 - Starting to install a snapshot FileSnapshot{0 files, 9 series, index-term: 0-0} into slot[93]
2021-11-05 19:39:54,020 [DataClientThread-218] INFO  o.a.i.c.l.s.FileSnapshot$Installer:200 - Starting to install a snapshot FileSnapshot{0 files, 4 series, index-term: 0-0} into slot[121]
2021-11-05 19:39:54,022 [DataClientThread-218] INFO  o.a.i.c.l.s.FileSnapshot$Installer:200 - Starting to install a snapshot FileSnapshot{0 files, 16 series, index-term: 0-0} into slot[153]
2021-11-05 19:39:54,025 [DataClientThread-218] INFO  o.a.i.c.l.s.FileSnapshot$Installer:200 - Starting to install a snapshot FileSnapshot{0 files, 4 series, index-term: 0-0} into slot[160]
2021-11-05 19:39:54,030 [DataClientThread-218] INFO  o.a.i.c.l.s.FileSnapshot$Installer:200 - Starting to install a snapshot FileSnapshot{0 files, 660 series, index-term: 0-0} into slot[363]
2021-11-05 19:39:54,337 [DataClientThread-218] INFO  o.a.i.c.l.s.FileSnapshot$Installer:200 - Starting to install a snapshot FileSnapshot{0 files, 1615 series, index-term: 0-0} into slot[366]
2021-11-05 19:39:54,535 [DataClientThread-218] INFO  o.a.i.c.l.s.FileSnapshot$Installer:200 - Starting to install a snapshot FileSnapshot{0 files, 2768 series, index-term: 0-0} into slot[574]
2021-11-05 19:39:54,876 [DataClientThread-218] INFO  o.a.i.c.l.s.FileSnapshot$Installer:200 - Starting to install a snapshot FileSnapshot{0 files, 4 series, index-term: 0-0} into slot[711]
2021-11-05 19:39:54,878 [DataClientThread-218] INFO  o.a.i.c.l.s.FileSnapshot$Installer:200 - Starting to install a snapshot FileSnapshot{0 files, 309 series, index-term: 0-0} into slot[843]
2021-11-05 19:39:54,928 [DataClientThread-218] INFO  o.a.i.c.l.s.FileSnapshot$Installer:200 - Starting to install a snapshot FileSnapshot{0 files, 23 series, index-term: 0-0} into slot[879]
2021-11-05 19:39:54,933 [DataClientThread-218] INFO  o.a.i.c.l.s.FileSnapshot$Installer:200 - Starting to install a snapshot FileSnapshot{0 files, 300511 series, index-term: 0-0} into slot[911]
2021-11-05 19:40:43,473 [DataClientThread-218] INFO  o.a.i.c.l.s.FileSnapshot$Installer:200 - Starting to install a snapshot FileSnapshot{0 files, 1573281 series, index-term: 0-0} into slot[981]

When a node is restarted, the sequence of local recovery and peer recovery is not controlled. local recovery could be slow due to a large mlog.bin,but peer recovery has begun. Although I haven't find what was wrong with it, it was obvious that the CPU load was climbing and the log files were reporting a lot of errors(Because CatchupTask has restored the schema, the local recovery starts repeating these operations).

Some thinking

Maybe, In catchupTask, we need to control both the size of a request and the amount of data on a single slot. Assuming that the number of schema on a single slot is one million, we might need to split it into 10 or more operations. Also, we need to control the size of the entire request to ensure that it does not exceed the thrift Frame limit (512MB).

In addition, when a node is restarted, the local recovery should be prioritized. The peer recovery can start only after the local recovery is complete. And we need to consider whether the conditions for mtree-snapshot are too strict, If no snapshot is taken for a long time, the mlog.bin file is too large and the recovery speed of nodes in the cluster is inconsistent, which may cause other problems(For example,nodes with slow recovery speed cannot be connected with others, repeated operations during recovery, and so on).

WDYT?

github-actions[bot] commented 3 years ago

Hi, this is your first issue in IoTDB project. Thanks for your report. Welcome to join the community!

OneSizeFitsQuorum commented 3 years ago

For the first point, the framing mechanism is actually mentioned in Raft's paper when he talks about the snapshot implementation, and we should approach the implementation this way as well. Can you record an issue and we will evaluate the priority redevelopment then?

On the second point, the work on mTree Snapshot seems important in this scenario. I think the judgment about Peer Recovery is also correct.

I suggest you ask these questions in community group or mailing list after setting up an issue, so that more people will pay attention and participate in the discussion. This is more likely to be scheduled and resolved sooner.

cigarl commented 3 years ago

I suggest you ask these questions in community group or mailing list after setting up an issue, so that more people will pay attention and participate in the discussion. This is more likely to be scheduled and resolved sooner.

Thanks for your reminding, I will send an email to the community later, and try to address some of the problems in this process next week.(eg., we might have duplicate requests in the catchupTask)

apache / iotdb

CatchupTask may cause a cluster crash? #4352

Description

Question

Some thinking