Closed njsmith closed 7 years ago
Any idea how this impacts performance?
On the performance comment. One of my concerns is about surrounding the execution of each task with timer calls. In my experience, timer calls aren't free and I've been burned by this sort of thing in the past (accidentally creating code that becomes "timer-bound" if you will). The uvloop document referenced makes a note about avoiding too many timer-related system calls as well.
With that in mind, I might be inclined to make the timer-related warning optional.
Maybe just relax the timer to only measure the total time of the loop cycle instead of per-task? Sure, we would lose some information, but this is less of an overhead IMHO.
I did some looking at the source to gevent to see how it is handling this. So far as I can tell, it is NOT batching tasks around the event loop quite like this. If I'm reading the source correctly (the _run_callbacks()
method in https://github.com/gevent/gevent/blob/master/src/gevent/libev/corecffi.py), it looks like it runs all available callbacks (including new ones added by run_callback()
), but with a counter. After 1000 callbacks, it goes back to the event loop.
Looking at asyncio, it appears to be doing some kind of batching similar to what's seen in this PR. The implementation is a bit simpler. It simply computes the length of the ready queue when it starts scheduling and only schedules that many tasks--regardless of whether or not more tasks get added to the queue in the process. I think I like that better (you don't need to manage two separate queue objects).
asyncio also includes some code to check for slow callbacks using timers. However, that code is only enabled if it's running in debug mode.
I think we're making a Kernel.run
method pretty fat with all those changes to how we run a single task. Maybe it's time to refactor it into something like Kernel._run_single_task
?
I've created a wiki page where we can put some description about the curio's event loop internals. I think we need some generalisation to prevent the big chaotic fluctuations of the course in which the event loop should develop.
Although the Kernel.run()
method is a bit fat, I also consider it to be fairly performance critical. All things equal, I'd rather it be fast than overly abstracted.
@dabeaz, ok, if we stick with keeping all functions in a local namespace, we can at least introduce something like _run_task
similar to the _cleanup_task
so we at least make sure this code live in one place. Otherwise, we might eventually end up with a barely maintainable code. Lets keep it fat but structured in some way. I deeply believe, that maintainability is crucial for curio.
The implementation is a bit simpler. It simply computes the length of the ready queue when it starts scheduling and only schedules that many tasks--regardless of whether or not more tasks get added to the queue in the process.
Oo, that is clever. Rewrote this to use that trick.
Any idea how this impacts performance?
I just added a new commit to this PR with the beginnings of a benchmark suite using asv. Seems like a useful thing to have in general. And I added two tiny pure-scheduler microbenchmarks. (Sorry this PR is getting so unfocused -- I can split it up if you want.)
I then tried using these microbenchmarks to compare version with the time_monotonic calls versus the version without, and wasn't able to get any reliable effect -- the impact seems to be smaller than the trial-to-trial noise. I also tried using perf, which works harder to try and tease out small differences, and it told me that on average the version with the time_monotonic calls was faster. So, I can't put a precise number on it, but it seems that even on a microbenchmark that does nothing but scheduling the slowdown is pretty small (less than, say, 10%?), or I'd have been able to detect it :-).
I just added a new commit to this PR with the beginnings of a benchmark suite using asv
This actually looks really promising. Any idea on how we can incorporate the graphs with results in curio's documentation page or somewhere publicly available? Would be really helpful to know the impact that a each PR will have on performance before merging them (without actually running any benchmarks against the target branch in your local env to check this).
Well, if we want public results pages then someone has to set up a server to run the benchmarks and generate the pages. The problem is that benchmarks need a fairly stable, quiet system, so you can't really use travis-ci or similar services. ASV is set up to support fairly ad hoc benchmark running (you can do things like ask it: "please benchmark all commits that are new since the last time I ran this", and then upload the results files to a server). But it isn't as strong at measuring single commits as it is at detecting trends over time, and it's hard to have an automated system run benchmarks on PRs because of the security issues around running untrusted code. . In the mean time, it's certainly possible to use ASV to test a PR locally!
On Nov 24, 2016 1:41 AM, "Alexander Zhukov" notifications@github.com wrote:
I just added a new commit to this PR with the beginnings of a benchmark suite using asv
This actually looks really promising. Any idea on how we can incorporate the graphs with results in curio's documentation page or somewhere publicly available? Would be really helpful to know the impact that a each PR will have on performance before merging them.
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/dabeaz/curio/pull/127#issuecomment-262732207, or mute the thread https://github.com/notifications/unsubscribe-auth/AAlOaALKN6L0mSzSSQ78XzXcLpYUJ1xNks5rBVvWgaJpZM4K6a_R .
I'm going to merge this so that it doesn't languish on Github. I want to modify it so that the time check is optional though--only enabled if in a debugging mode or if set to a non-zero value.
This PR makes curio's scheduling proceed in strict batches: each batch starts with polling for I/O, and if a task becomes runnable now, it won't actually run until the next batch. Basically this gives us the invariant that no single task can be scheduled twice without checking for I/O in between.
This prevents various obscure pathologies (see gh-112), and brings us in line with how most (all?) other event loops work; see for example: http://docs.libuv.org/en/v1.x/design.html#the-i-o-loop
It also adds a warning if any individual task blocks the event loop for more than 50 ms. This value is pretty arbitrary, but at least it's a start. Combined with the above change, this should at least be sufficient to detect any gratuitous starvation issues.