Closed WaDelma closed 6 years ago
This looks really interesting! I can't say when I have time to review it, in case I forget it feel free to ping me.
@csherratt I would really appreciate if you could also take a look at this PR. @WaDelma Given that I'm very busy at the moment, it's possible that I can't review this until Friday or Saturday.
Np, I am not in a hurry!
I would do benchmarking, but my main PC died and I am using crummy laptop.
This PR looks good to me, thank you! Would still be nice if @kvark or @csherratt could take a look.
Reviewed 2 of 2 files at r4. Review status: all files reviewed at latest revision, 1 unresolved discussion.
Comments from Reviewable
What difference does the benchmark show? If not, we should have a benchmark that does.
Ah right, we need a benchmark!
I asked for one first, but forgot it then.
Benchmark results on AMD Ryzen 5 1600 (6 cores / 12 SMT x 3.2 GHz) 64bit Fedora 27
par_n is parallel iterator with splitting of n-highest levels.
There seems to be quite bit of overhead for parallelism, but after there is enough bits it's more efficient.
I didn't notice any change between the old splitting strategy and the new one, but the new one should work better if the bitset is unevenly distributed. (I should probably figure out benchmark for that).
seq | Par_3 | Par_2 | Par_1 | |
---|---|---|---|---|
100 | 690 | 19850 | 19045 | 9721 |
1000 | 5611 | 34946 | 27457 | 12587 |
10000 | 48782 | 56505 | 34214 | 32321 |
100000 | 331948 | 101915 | 66843 | 159194 |
1000000 | 1162002 | 234203 | 201314 | 614811 |
log-log scale linear scale
After sleeping the night I also realised that that benchmark is flawed: The per iteration computation is minimal, which means that it's measuring mostly the overhead of rayon.
I need to make benchmark with more realistic per iteration computation as it should make splitting on all layers more efficient. I should also probably benchmark it with unbalanced per iteration computation too.
I benchmarked different levels of splitting with different payloads. It seems that if there is per entity work then there is less difference between how many layers splitting is done. I reverted it to split to all 3 levels as it seems more stable and efficient when you have lots of entities (and I think in some rayon blog it was mentioned that you should split as much as you can... Not sure about this though).
Benchmarking is annoying. I just realised I should probably also benchmark with different amount of cores.
Par_2 | Par_3 | Par_2_payload_100 | Par_3_payload_100 | Par_2_payload_1000 | Par_3_payload_1000 | Par_2_payload_10000 | Par_3_payload_10000 | |
---|---|---|---|---|---|---|---|---|
100 | 19048 | 20035 | 20771 | 21419 | 25701 | 30633 | 87458 | 126330 |
1000 | 27744 | 35203 | 34103 | 43724 | 130098 | 100184 | 1063586 | 647348 |
10000 | 34135 | 57760 | 142638 | 135950 | 1050928 | 684282 | 6240367 | 6146004 |
100000 | 67404 | 100416 | 800828 | 818030 | 6189107 | 6077761 | 59545648 | 58536681 |
1000000 | 200724 | 235456 | 7727451 | 5184305 | 41417368 | 40561180 | 403868928 | 393994208 |
@WaDelma what's the status of this PR?
Review status: 0 of 3 files reviewed at latest revision, 5 unresolved discussions.
Comments from Reviewable
As long as you are happy with my benchmarking and don't find anything else wrong with the PR, I think this should be ready.
@kvark Are you happy with this PR? Or do we still need some more diagrams and comments?
Whoops... Tried to clear a comment and accidentally clicked Close and comment
.
As @kvark said it would be nice to have the commits squashed or reorganized. Otherwise this is really great, so please r=me once you enhanced the commits.
bors delegate=WaDelma
:v: WaDelma can now approve this pull request
Should I remove some of the benchmarks? They seem to slow CI down.
Also I am not that happy with average_ones
as it's results are bit unintuitive: 01001 is split 01, 001 and not 0100, 1. It doesn't affect it's use, but it makes for a weird API and explanation. Dunno if it's worth changing.
Follow up is fine. As for benches - I don't think we should rely on CI anyway, so we can turn the par iter off there.
On Jan 2, 2018, at 04:30, WaDelma notifications@github.com wrote:
Should I remove some of the benchmarks? They seem to slow CI down.
Also I am not that happy with average_ones as it's results are bit unintuitive: 01001 is split 01, 001 and not 0100, 1. It doesn't affect it's use, but it makes for weird API and explanation. Dunno if it's worth changing.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or mute the thread.
bors r=me
@WaDelma You can approve a PR for me by writing "bors r=torkleyy". (That's what I meant by saying "r=me").
ah, sorry. Don't really know how bors works :P
bors r-
bors r=torkleyy
Improves parallel iterators splitting behaviour and refactors it.
The improvement: Before 1000000011111111 was split to 10000000 and to 11111111 After 10001111 is split to 10000000111 and to 11111
This change is