global_table 2分10s延迟删除太慢了，在高并发下，这张表数据太多了，有啥优化方案吗

PeppaO commented 2 weeks ago

[ ] I have searched the issues of this repository and believe that this is not a duplicate.

Ⅰ. Issue Description

global_table 2分10s延迟删除太慢了，在高并发下，这张表数据太多了，有啥优化方案吗？除了调低这个130s的参数，还有其他方案吗？如果等它低频的时候删除完毕，我很担心在高频的时候随着global_table的堆积，影响tps，浪费数据库磁盘空间

Ⅱ. Describe what happened

If there is an exception, please attach the exception trace:

Just paste your stack trace here!

Ⅲ. Describe what you expected to happen

Ⅳ. How to reproduce it (as minimally and precisely as possible)

xxx
xxx
xxx

Minimal yet complete reproducer code (or URL to code):

Ⅴ. Anything else we need to know?

Ⅵ. Environment:

JDK version(e.g. java -version):
Seata client/server version:
Database version:
OS(e.g. uname -a):
Others:

PeppaO commented 2 weeks ago

@slievrly @funky-eyes @lightClouds917 @wangliang181230

slievrly commented 2 weeks ago

Increase queryLimit
Asynchronous task to clean up global_table and perform data sharding

funky-eyes commented 1 week ago

我设想了以下几种方案：

通过分片解决，假设有3台tc节点，每一台的tc节点对应的global table name设置为global_table01依次类推，然后将distributed-lock-table参数删除，这样就不会启用分布式锁，以tc节点和对应的global table为维度并行延迟删除。
通过结合raft/db/redis等能力，做到内部通信或间接通信 raft：通过raft集群搭建后，leader进行定时扫描committing和rollbacking状态的事务，已经达到2分10秒要求的xid进行任务分配到各个节点，假设有3台tc，总共3000个需要延迟删除的事务，那么久划分为每个tc1000个xid，由leader发送任务，然后每个tc进行执行延迟删除逻辑，这样就可以提升并发度 redis：通过新增一个分布式锁，选出节点间的leader，然后leader节点执行类似上述raft模式的任务，通过lpush&rpop方式发布任务多个任务，比如1000个xid为一个任务，3000个就会lpush3次，然后消费到的tc进行延迟删除，也达到了并行的效果 db：类似redis的做法，增加分布式锁，然后leader将任务发布到一个任务表中，每个节点每次只查询改任务表中第一条，查询到后执行delete，当删除成功的节点就是抢到任务执行的节点，进行任务执行。 I have envisioned the following solutions:

By using sharding, assuming there are 3 TC nodes, each TC node's corresponding global table name is set to global_table01 and so on. Then, remove the distributed-lock-table parameter so that distributed locking is not enabled. This way, parallel delayed deletion can be achieved by taking TC nodes and their corresponding global tables as dimensions.

By combining the capabilities of Raft, DB, Redis, etc., internal or indirect communication can be achieved.

Raft: After setting up a Raft cluster, the leader periodically scans transactions in the committing and rollbacking states. Transactions that have reached the 2-minute 10-second requirement are divided among the nodes. For example, if there are 3 TC nodes and a total of 3000 transactions need delayed deletion, each TC node will handle 1000 transactions assigned by the leader. This increases concurrency. Redis: By adding a distributed lock, selecting a leader among the nodes, and having the leader execute tasks similar to the Raft model. Tasks are published using lpush and rpop, where each task may contain, for example, 1000 transactions. With 3000 transactions, the leader will lpush 3 times, and each TC node will consume tasks for delayed deletion, achieving parallel processing. DB: Similar to the Redis approach, add a distributed lock. The leader publishes tasks to a task table. Each node queries the task table for the first task, executes the deletion, and the node that successfully deletes the task is the one performing the task. This ensures task execution. These solutions aim to optimize concurrent processing and efficiency for delayed deletion tasks using different methods like sharding, Raft, Redis, and DB.

apache / incubator-seata