SpongePowered / Sponge

The SpongeAPI implementation targeting vanilla Minecraft and 3rd party platforms.
MIT License
390 stars 211 forks source link

New scheduling mechanism #3954

Open sunmisc opened 8 months ago

sunmisc commented 8 months ago

This collaboration with @icelimetea

Major change: The new scheduler minimizes O(n) per iteration of tasks on every tick, which is achieved thanks to the binary heap. Since we now have a time ordering of tasks, we can stop iterating on tasks when we are sure that the next tasks will definitely not be executed at the current time Worst-case performance: O(n), Best-case performance: O(1) enqueuing/dequeuing - O(log(n))

This has also helped AsyncScheduler, now each worker in the pool can take tasks from the queue independently, which will improve efficiency.

The basic operations are to delay the start of a task, and then a periodic task can be represented as a recursion

schedule(() -> {
     // do something
     schedule(() -> {
            // recursive
     }, 2)
}, 5)

This is more flexible if you want to represent the period dynamically (first 5 seconds, so every 2 and every 1 second).

ABOUT API:

There are still "rough edges" in the current implementation (e.g. package-private), which I think the API changes will fix.

aromaa commented 8 months ago

The scheduler ideally should be reworked to use PriorityQueue<T> to make this easier and more lightweight. Alternatively you could use TreeMap<K, V> to keep everything in one collection but you HAVE TO remove and add it back to the collection after every task execution or you have a broken tree.

sunmisc commented 8 months ago

The scheduler ideally should be reworked to use PriorityQueue<T> to make this easier and more lightweight. Alternatively you could use TreeMap<K, V> to keep everything in one collection but you HAVE TO remove and add it back to the collection after every task execution or you have a broken tree.

I'm already working on it

icelimetea commented 8 months ago

I've also been thinking about Sponge's scheduler (and discussing it together with @sunmisc).

Would it be a good idea to change the API in a way so asynchronous scheduler accepts delays and intervals only in real time units (nanoseconds/milliseconds/seconds/etc) and synchronous scheduler accepts those parameters only in ticks?

If you are specifying a task interval in seconds, synchronous scheduler can't really properly handle it due to the fact that the tick length might change depending on server load. Similarly, asynchronous scheduler has to "guess" the tick length and therefore can't reliably handle ticks too.

Such a big API change, however, can break a lot of things, so not sure about that. :\

aromaa commented 8 months ago

Just quickly taking a look at this and I'm not entirely sure what's the reasoning for the new interfaces? Also the formatting is all over the place. We don't use wildcard imports, keep field definitions one per line and always use brackets. The spacing is also inconsistent.

To comment more on the approach itself and on the concerns about the separation of the tick and time based itself. One could easily draw the line where the sync scheduler always uses tick based scheduling and the async one uses wall clock time. Whats definitely wrong here is that the sync scheduler should not ever use wall clock time when ticks are specified, that is just going to cause many subtle surprises, inconsistent and wrong behavior as the base game is tick based and many actions depend on it.

One could easily and efficiently still support the both cases for both schedulers by just scheduling the tick based tasks to their own collection and the wall clock based to other. That way you can easily pull the most relevant tick and wall clock based tasks.

sunmisc commented 8 months ago

formatting tried to fix it.

I didn't break the tics, the two methods handle time differently:

private final Time ticked = new Time() {
    @Override
    public boolean tickBased() {
        return true;
    }

    @Override
    public long timeNanos() {
        VarHandle.acquireFence();
        return SyncScheduler.this.timestamp;
    }
}

Interfaces are used only package-private for internal logic

I didn't suggest removing ticks. I just wanted to change the design. So far it's weird that the synchronous scheduler handles something other than ticks, yet you want to have real-time scheduling in the main thread

aromaa commented 8 months ago

I didn't break the tics, the two methods handle time differently

I meant that they are are effectively no longer based on ticks but on wall clock time so if the server is running behind it would fire too early.

So far it's weird that the synchronous scheduler handles something other than ticks, yet you want to have real-time scheduling in the main thread

Not sure I follow? Its mostly about the difference between timing. If you schedule 20 ticks it would wait 20 ticks. If you schedule 1s it could be less than 20 ticks.

sunmisc commented 8 months ago

I didn't break the tics, the two methods handle time differently

I meant that they are are effectively no longer based on ticks but on wall clock time so if the server is running behind it would fire too early.

So far it's weird that the synchronous scheduler handles something other than ticks, yet you want to have real-time scheduling in the main thread

Not sure I follow? Its mostly about the difference between timing. If you schedule 20 ticks it would wait 20 ticks. If you schedule 1s it could be less than 20 ticks.

I didn't break the main logic based on real time and ticks. All behavior is the same as the old implementation. The timestamp is still used in SyncScheduler (if the task is specified in ticks). Or have I made a mistake somewhere?

aromaa commented 8 months ago

I didn't break the main logic based on real time and ticks.

Ah I see now, you are changing how the time is calculated there. I don't like how subtle that is, also its overly complex compared to how you could more easily simplify the implementation with just simple PriorityQueue<T>. The DelayQueue<T> is already using PriorityQueue<T> under the hood so you are not losing anything there perf wise.

It would be preferred to keep these changes to minimal here to make it easier to review as touching any of this code is very fragile and hard to get right. Improvements can be done incrementally more easily. There are few tricks you are trying to play to try to be clever and I'm not really fan of it. For example the VarHandle usage in ScheduledTaskEnvelope or the usage of System#identityHashCode.

Generally retrieving the identity hash code has bad performance characteristics and the JVM has to deoptimize various operations on the object itself. There is also pending JVM optimizations to try to make the object header smaller which would hurt in this case when the hash code has to be generated. In this case it adds zero benefit and is just extra overhead.

sunmisc commented 8 months ago

I didn't break the main logic based on real time and ticks.

Ah I see now, you are changing how the time is calculated there. I don't like how subtle that is, also its overly complex compared to how you could more easily simplify the implementation with just simple PriorityQueue<T>. The DelayQueue<T> is already using PriorityQueue<T> under the hood so you are not losing anything there perf wise.

It would be preferred to keep these changes to minimal here to make it easier to review as touching any of this code is very fragile and hard to get right. Improvements can be done incrementally more easily. There are few tricks you are trying to play to try to be clever and I'm not really fan of it. For example the VarHandle usage in ScheduledTaskEnvelope or the usage of System#identityHashCode.

Generally retrieving the identity hash code has bad performance characteristics and the JVM has to deoptimize various operations on the object itself. There is also pending JVM optimizations to try to make the object header smaller which would hurt in this case when the hash code has to be generated. In this case it adds zero benefit and is just extra overhead.

Yes, we could use PriorityQueueue, but then we would have to check for getDelay() <= 0 DelayQueue delegates to PriorityBlockingQueue

VarHandle is not for beauty! Atomic classes have a small footprint. Since they are regular classes, I used atomic classes in places with rare allocation, and for creating tasks I tried to save money.

However, identityHashCode is faster than randomUUID (because it uses SecureRandom), but I did this to eliminate a VERY rare bug where randomUUID could produce a non-unique result.

All these micro-optimizations are benchmarked, and they are really faster, but if they bother your eyes, you can give them up.

Benchmark                         Mode  Cnt       Score       Error   Units
IncBench.randomUUID              thrpt    4    2552.546 ±   520.853  ops/ms
IncBench.uuidByIdentityHashCode  thrpt    4  330383.749 ± 60948.532  ops/ms
aromaa commented 8 months ago

Yes, we could use PriorityQueueue, but then we would have to check for getDelay() <= 0

That's not a problem, the implementation would still be more cleaner.

VarHandle is not for beauty! Atomic classes have a small footprint.

These savings are minimal so not worth it here.

However, identityHashCode is faster than randomUUID (because it uses SecureRandom)

I'm not sure what you are arguing about? Insecure random is obviously faster? That's correctness vs speed, oranges to apples. Also the identity hash code has side effects which does not show up in micro benchmarks. You could instead generate random long for the high bits and then use incrementing id for the low bits, that basically eliminates your concern.

but I did this to eliminate a VERY rare bug where randomUUID could produce a non-unique result.

This is the birthday problem, very unlikely here but quite "easy" to fix with better alternative :)

sunmisc commented 8 months ago

Yes, we could use PriorityQueueue, but then we would have to check for getDelay() <= 0

That's not a problem, the implementation would still be more cleaner.

VarHandle is not for beauty! Atomic classes have a small footprint.

These savings are minimal so not worth it here.

However, identityHashCode is faster than randomUUID (because it uses SecureRandom)

I'm not sure what you are arguing about? Insecure random is obviously faster? That's correctness vs speed, oranges to apples. Also the identity hash code has side effects which does not show up in micro benchmarks. You could instead generate random long for the high bits and then use incrementing id for the low bits, that basically eliminates your concern.

but I did this to eliminate a VERY rare bug where randomUUID could produce a non-unique result.

This is the birthday problem, very unlikely here but quite "easy" to fix with better alternative :)

I have changed everything you mentioned except DelayQueue -> PriorityQueue because that requires you to create two more locks, which complicates the code rather than simplifies it.

You can use one lock for two queues because the main operation happens in one thread, and there is very little competition for adding tasks from other threads, but there will still be more code.

However, I don't care (don't take this as rude), I changed the main logic, everything else is secondary. I can make changes based on your recommendations.

aromaa commented 8 months ago

I have changed everything you mentioned except DelayQueue -> PriorityQueue

The DelayQueue is fine for the wall clock but I was hoping to get rid of the Time interface and see PriorityQueue<T> for the tick based tasks. Sorry, I was not very clear what I meant. I think its better to avoid the abstraction here and make it much clearer what the intend here is.

I think the following shape would be a bit more ideal:

private final PriorityQueue<ScheduledTickBasedTask> tickTasks;
private final DelayQueue<ScheduledTimeBasedTask> timeTasks;

static class ScheduledTickBasedTask {
    private final SpongeTask task;
    private final long startTick; //TICKS!

    public void run() {
        //Run the task
        //Reschedule if repeating
    }
}

static class ScheduledTimeBasedTask {
    private final SpongeTask task;
    private final long startTime; //Wall clock!

    public void run() {
        //Run the task
        //Reschedule if repeating
    }
}
sunmisc commented 8 months ago

I have changed everything you mentioned except DelayQueue -> PriorityQueue

The DelayQueue is fine for the wall clock but I was hoping to get rid of the Time interface and see PriorityQueue<T> for the tick based tasks. Sorry, I was not very clear what I meant. I think its better to avoid the abstraction here and make it much clearer what the intend here is.

I think the following shape would be a bit more ideal:

private final PriorityQueue<ScheduledTickBasedTask> tickTasks;
private final DelayQueue<ScheduledTimeBasedTask> timeTasks;

static class ScheduledTickBasedTask {
    private final SpongeTask task;
    private final long startTick; //TICKS!

    public void run() {
        //Run the task
        //Reschedule if repeating
    }
}

static class ScheduledTimeBasedTask {
    private final SpongeTask task;
    private final long startTime; //Wall clock!

    public void run() {
        //Run the task
        //Reschedule if repeating
    }
}

I use the Time interface not only here, but also in the SpongeTask class, so I decided to abstract it, and DelayQueue does not do any heuristics with System.nanoTime(). This queue only interacts with the getDelay() method.

aromaa commented 8 months ago

I use the Time interface not only here, but also in the SpongeTask class, so I decided to abstract it, and DelayQueue does not do any heuristics with System.nanoTime(). This queue only interacts with the getDelay() method.

You could just use System.nanoTime() directly on the wall clock based queue and on the tick based one get the current tick the server is on.

By doing the rescheduling inside the ScheduledTickBasedTask and ScheduledTimeBasedTask it should be straight forward to reschedule it without any extra abstraction?