amethyst / hibitset

Hierarchical bit set container
Other
117 stars 32 forks source link

Improve parallel iterators splitting behaviour #19

Closed WaDelma closed 6 years ago

WaDelma commented 6 years ago

Improves parallel iterators splitting behaviour and refactors it.

The improvement: Before 1000000011111111 was split to 10000000 and to 11111111 After 10001111 is split to 10000000111 and to 11111


This change is Reviewable

torkleyy commented 6 years ago

This looks really interesting! I can't say when I have time to review it, in case I forget it feel free to ping me.

torkleyy commented 6 years ago

@csherratt I would really appreciate if you could also take a look at this PR. @WaDelma Given that I'm very busy at the moment, it's possible that I can't review this until Friday or Saturday.

WaDelma commented 6 years ago

Np, I am not in a hurry!

WaDelma commented 6 years ago

I would do benchmarking, but my main PC died and I am using crummy laptop.

torkleyy commented 6 years ago

This PR looks good to me, thank you! Would still be nice if @kvark or @csherratt could take a look.


Reviewed 2 of 2 files at r4. Review status: all files reviewed at latest revision, 1 unresolved discussion.


Comments from Reviewable

kvark commented 6 years ago

What difference does the benchmark show? If not, we should have a benchmark that does.

torkleyy commented 6 years ago

Ah right, we need a benchmark!

torkleyy commented 6 years ago

I asked for one first, but forgot it then.

WaDelma commented 6 years ago

Benchmark results on AMD Ryzen 5 1600 (6 cores / 12 SMT x 3.2 GHz) 64bit Fedora 27

par_n is parallel iterator with splitting of n-highest levels.

There seems to be quite bit of overhead for parallelism, but after there is enough bits it's more efficient.

I didn't notice any change between the old splitting strategy and the new one, but the new one should work better if the bitset is unevenly distributed. (I should probably figure out benchmark for that).

  seq Par_3 Par_2 Par_1
100 690 19850 19045 9721
1000 5611 34946 27457 12587
10000 48782 56505 34214 32321
100000 331948 101915 66843 159194
1000000 1162002 234203 201314 614811

log-log scale iter benchmarks linear scale iter benchmarks

WaDelma commented 6 years ago

After sleeping the night I also realised that that benchmark is flawed: The per iteration computation is minimal, which means that it's measuring mostly the overhead of rayon.

I need to make benchmark with more realistic per iteration computation as it should make splitting on all layers more efficient. I should also probably benchmark it with unbalanced per iteration computation too.

WaDelma commented 6 years ago

I benchmarked different levels of splitting with different payloads. It seems that if there is per entity work then there is less difference between how many layers splitting is done. I reverted it to split to all 3 levels as it seems more stable and efficient when you have lots of entities (and I think in some rayon blog it was mentioned that you should split as much as you can... Not sure about this though).

Benchmarking is annoying. I just realised I should probably also benchmark with different amount of cores.

  Par_2 Par_3 Par_2_payload_100 Par_3_payload_100 Par_2_payload_1000 Par_3_payload_1000 Par_2_payload_10000 Par_3_payload_10000
100 19048 20035 20771 21419 25701 30633 87458 126330
1000 27744 35203 34103 43724 130098 100184 1063586 647348
10000 34135 57760 142638 135950 1050928 684282 6240367 6146004
100000 67404 100416 800828 818030 6189107 6077761 59545648 58536681
1000000 200724 235456 7727451 5184305 41417368 40561180 403868928 393994208

Parallel payload iteration benchmark

torkleyy commented 6 years ago

@WaDelma what's the status of this PR?


Review status: 0 of 3 files reviewed at latest revision, 5 unresolved discussions.


Comments from Reviewable

WaDelma commented 6 years ago

As long as you are happy with my benchmarking and don't find anything else wrong with the PR, I think this should be ready.

torkleyy commented 6 years ago

@kvark Are you happy with this PR? Or do we still need some more diagrams and comments?

WaDelma commented 6 years ago

Whoops... Tried to clear a comment and accidentally clicked Close and comment.

torkleyy commented 6 years ago

As @kvark said it would be nice to have the commits squashed or reorganized. Otherwise this is really great, so please r=me once you enhanced the commits.

bors delegate=WaDelma

bors[bot] commented 6 years ago

:v: WaDelma can now approve this pull request

WaDelma commented 6 years ago

Should I remove some of the benchmarks? They seem to slow CI down.

Also I am not that happy with average_ones as it's results are bit unintuitive: 01001 is split 01, 001 and not 0100, 1. It doesn't affect it's use, but it makes for a weird API and explanation. Dunno if it's worth changing.

kvark commented 6 years ago

Follow up is fine. As for benches - I don't think we should rely on CI anyway, so we can turn the par iter off there.

On Jan 2, 2018, at 04:30, WaDelma notifications@github.com wrote:

Should I remove some of the benchmarks? They seem to slow CI down.

Also I am not that happy with average_ones as it's results are bit unintuitive: 01001 is split 01, 001 and not 0100, 1. It doesn't affect it's use, but it makes for weird API and explanation. Dunno if it's worth changing.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or mute the thread.

WaDelma commented 6 years ago

bors r=me

torkleyy commented 6 years ago

@WaDelma You can approve a PR for me by writing "bors r=torkleyy". (That's what I meant by saying "r=me").

WaDelma commented 6 years ago

ah, sorry. Don't really know how bors works :P

WaDelma commented 6 years ago

bors r-

bors[bot] commented 6 years ago

Canceled

WaDelma commented 6 years ago

bors r=torkleyy

bors[bot] commented 6 years ago

Build succeeded