apache / shardingsphere-elasticjob

Distributed scheduled job
Apache License 2.0
8.15k stars 3.29k forks source link

New instance not able to trigger resharding #1840

Open nevenchen opened 3 years ago

nevenchen commented 3 years ago

VERSION: 3.0.0.rc1 Project: ElasticJob-Lite

Expected behavior:

I have a job in my app deployed with 2 node. The job set the config 'sharding-total-count' to 3. I start one node first, when all 3 sharding job run on the node. I start the second node.

The "job-sharding-strategy-class" left as defult. After the second node start, I think the second node should take at lease one sharding job. That means: Node1: Take 2 thread to run the job. Node2: Take 1 thread to run the job.

Actual behavior:

Node1: Take 3 thread to run the job. Node2: Take 0 thread to run the job.

Reason analyze (If you can)

The ReconcileService is going to see whether a resharding need: protected void runOneIteration() { int reconcileIntervalMinutes = configService.load(true).getReconcileIntervalMinutes(); if (reconcileIntervalMinutes > 0 && (System.currentTimeMillis() - lastReconcileTime >= reconcileIntervalMinutes * 60 * 1000)) { lastReconcileTime = System.currentTimeMillis(); if (!shardingService.isNeedSharding() && shardingService.hasShardingInfoInOfflineServers()) { log.warn("Elastic Job: job status node has inconsistent value,start reconciling..."); shardingService.setReshardingFlag(); } } }

It find all item from "instances" node in ZK: In my case, it is: {10.37.27.20@-@17912, 10.37.27.20@-@18540}

Then for each of those value, to compare whit the sharding job items in ZK node "sharding", In my case, the values are: {0: 10.37.27.20@-@17912, 1: 10.37.27.20@-@17912, 2: 10.37.27.20@-@17912 }

As all value from "sharding" are equals with 10.37.27.20@-@17912 from "instances", the resharding will not bee trigger. I will make the second app node never have a chance to take the sharding job item.

Steps to reproduce the behavior.

Example codes for reproduce this issue (such as a github link).

TeslaCN commented 3 years ago

Hi @nevenchen Could you dump your data in Zookeeper? Refer to https://shardingsphere.apache.org/elasticjob/current/en/user-manual/elasticjob-lite/operation/dump/

jingting-zhang commented 2 years ago

@TeslaCN elastic-job不支持新增实例后立即再分片(把已分配但未执行的分片重新分配给新增的实例),只能等下次执行周期再重新分片。这个能否支持,有无必要支持?