Open emar-kar opened 4 months ago
Thanks for your valuable comment!
The sharding mechenism is used for distributed task scheduling.
Let's say you have 4 executors, each of them running on different physical machine. And you want to do some port scanning (identify whether a port is open for a given range of IPs, like: figure out how many web servers are running on the entire IPv4 space), to speed up the scanning process, it's better to distribute all task evenly to these executors.
When creating scheduler, provide WithNumShards(8)
and WithShard(0)
as arguments.
// executor 0
scheduler := gojob.New(
gojob.WithShard(0),
gojob.WithNumShards(4),
)
// executor 1
scheduler := gojob.New(
gojob.WithShard(1),
gojob.WithNumShards(4),
)
// executor 2
scheduler := gojob.New(
gojob.WithShard(2),
gojob.WithNumShards(4),
)
// executor 3
scheduler := gojob.New(
gojob.WithShard(3),
gojob.WithNumShards(4),
)
Thanks a lot, I guess it would be better to print some log about the sharding configuration.
Thx for your explanation. But in this case, there should be something above schedulers, like orchestrator, to deliver tasks to specific executors. Am I wrong? Otherwise, I don't see practical usage of sharding on the level of scheduler. Since if I'm running it on one physical machine, it's very easy to misconfigure it and my tasks will never get to execution. And even If I start several schedulers on different machines, I still will need to keep in mind the configuration of each of them, since there are no guarantees, that my task will be submitted.
Maybe it's better to return error from Submit
?
UPD: I got the main idea of sharding, and see it's purpose now, but I would prefer to at least get some flexible control on Task submit, so I can for example do something like
for _, ex := range executors {
if err := ex.Submit(myTask); err != nil {
// log error and continue
} else {
break
}
}
Update: Sharding can be disabled by just omit the Sharding related options while constructing scheduler.
scheduler := gojob.New()
I will create a new example and a doc file to illustrate sharding mechanism later.
In current realisation of the "sharding" in certain conditions, task will never be submitted, and user will not receive any notification about it. What is actually being sharded? What's the purpose of it?