Open mdboom opened 1 year ago
(In particular the 53% number comes from Linux.)
(In particular the 53% number comes from Linux.)
Yes -- good callout. On Windows, it's only 23% slower (and is more in the middle of other benchmarks, not the slowest outlier like on Linux).
I modified the benchmark so (a) we test different chunk sizes, and (b) we run with either the "do nothing function" in the current benchmark, or a function that calculates the factorial of 10.
As chunk size increases, nogil does better, but still the best is about 40% slower.
When the threads each do actual work, you can see the effect of nogil, where if the chunk size is right it's 60% faster, but that seems to tap out as chunk size increases (I don't really know how to explain that yet).
It's fair to say this is not a great benchmark for measuring the effect of the GIL -- it's intention (I would assume) was to measure the overhead of ThreadPool.imap
. It's still interesting nonetheless, perhaps, as a significant unintentional regression.
Here's the benchmark, which we know is 53% slower on nogil-latest vs. upstream:
https://github.com/python/pyperformance/blob/main/pyperformance/data-files/benchmarks/bm_concurrent_imap/run_benchmark.py#L19
It looks to be sending lists of ints of length 10 to each thread, and then calling a very simple function on each of the values. It's possible the coordination overhead of passing those lists of ints is dominating over the actual work. As a start, it might be interesting to see what happens as the length of the lists increases, or as the amount of work being done in the function increases.
Cc: @brandtbucher (as one who pointed this out).