WangYihang / gojob

Go(od) Job is a simple job scheduler that supports task retries, logging, and task sharding.
MIT License
10 stars 1 forks source link

BUG with sharding: task will never be submitted without any notification #9

Open emar-kar opened 4 months ago

emar-kar commented 4 months ago

In current realisation of the "sharding" in certain conditions, task will never be submitted, and user will not receive any notification about it. What is actually being sharded? What's the purpose of it?

WangYihang commented 4 months ago

Thanks for your valuable comment!

The sharding mechenism is used for distributed task scheduling.

Let's say you have 4 executors, each of them running on different physical machine. And you want to do some port scanning (identify whether a port is open for a given range of IPs, like: figure out how many web servers are running on the entire IPv4 space), to speed up the scanning process, it's better to distribute all task evenly to these executors.

When creating scheduler, provide WithNumShards(8) and WithShard(0) as arguments.

// executor 0
scheduler := gojob.New(
    gojob.WithShard(0),
    gojob.WithNumShards(4),
)

// executor 1
scheduler := gojob.New(
    gojob.WithShard(1),
    gojob.WithNumShards(4),
)

// executor 2
scheduler := gojob.New(
    gojob.WithShard(2),
    gojob.WithNumShards(4),
)

// executor 3
scheduler := gojob.New(
    gojob.WithShard(3),
    gojob.WithNumShards(4),
)

Thanks a lot, I guess it would be better to print some log about the sharding configuration.

emar-kar commented 4 months ago

Thx for your explanation. But in this case, there should be something above schedulers, like orchestrator, to deliver tasks to specific executors. Am I wrong? Otherwise, I don't see practical usage of sharding on the level of scheduler. Since if I'm running it on one physical machine, it's very easy to misconfigure it and my tasks will never get to execution. And even If I start several schedulers on different machines, I still will need to keep in mind the configuration of each of them, since there are no guarantees, that my task will be submitted.

emar-kar commented 4 months ago

Maybe it's better to return error from Submit?

UPD: I got the main idea of sharding, and see it's purpose now, but I would prefer to at least get some flexible control on Task submit, so I can for example do something like

for _, ex := range executors {
    if err := ex.Submit(myTask); err != nil {
        // log error and continue
    } else {
        break
    }
}
WangYihang commented 4 months ago

Update: Sharding can be disabled by just omit the Sharding related options while constructing scheduler.

scheduler := gojob.New()

I will create a new example and a doc file to illustrate sharding mechanism later.