Open joni-herttuainen opened 2 years ago
Thank you for quantifying this - I've always assumed it's a problem, but never got around to looking at it.
Dispatching many small reads vs coalescing them makes a huge difference, I will have to give that a try sometime.
I couldn't help but try: https://github.com/BlueBrain/libsonata/pull/226
I think this could need the same treatment as done for the report API: close consecutive ids/elements could be merged into chunks that are used to aggregate reads and not hit the file system as much. Noting this as it may be the underlying issue we see in SPIND-235.
I should have closed this with https://github.com/BlueBrain/libsonata/pull/226, but we can reuse it for this.
I'm wondering if we could collapse the new logic for read aggregation the edge population reading w/ the one that's in the report reading. Have you looked at that @matz-e? If not, I can have a gander at some point soon.
did a quick and dirty version here: https://github.com/BlueBrain/libsonata/tree/try-chunked-pop-read
can you see if that helps w/ your use case @matz-e?
it doesn't work exactly how I'd want it to yet, but it seems faster-ish.
Thanks, @mgeplf! I'll try to test that soon™
It's pretty hacky, but at least it gives an idea of how things could be better.
if one sets GAP, one can change the max chunk size.
A quick scan of values gives 1e6 to be a sweet spot:
(venv39) gevaert@r2i1n34 ~/private/src/libsonata (try-chunked-pop-read *)$ for GAP in 32 2048 65536 1048576 16777216 536870912 ; do export GAP; echo $GAP; /usr/bin/time python3 bench-read-pop.py; done [401/9714]
32
18.18user 8.88system 0:30.46elapsed 88%CPU (0avgtext+0avgdata 107316maxresident)k
0inputs+0outputs (0major+23543minor)pagefaults 0swaps
/usr/bin/time python3 bench-read-pop.py 18.19s user 8.89s system 88% cpu 30.465 total
2048
12.88user 8.76system 0:26.28elapsed 82%CPU (0avgtext+0avgdata 107312maxresident)k
0inputs+0outputs (0major+23548minor)pagefaults 0swaps
/usr/bin/time python3 bench-read-pop.py 12.88s user 8.77s system 82% cpu 26.286 total
65536
8.87user 8.91system 0:24.26elapsed 73%CPU (0avgtext+0avgdata 107324maxresident)k
0inputs+0outputs (0major+23546minor)pagefaults 0swaps
/usr/bin/time python3 bench-read-pop.py 8.87s user 8.92s system 73% cpu 24.271 total
1048576
8.22user 8.55system 0:23.31elapsed 71%CPU (0avgtext+0avgdata 107320maxresident)k
0inputs+0outputs (0major+23544minor)pagefaults 0swaps
/usr/bin/time python3 bench-read-pop.py 8.23s user 8.56s system 71% cpu 23.314 total
16777216
10.57user 8.96system 0:25.92elapsed 75%CPU (0avgtext+0avgdata 134480maxresident)k
0inputs+0outputs (0major+30459minor)pagefaults 0swaps
/usr/bin/time python3 bench-read-pop.py 10.57s user 8.97s system 75% cpu 25.932 total
536870912
17.01user 9.39system 0:33.24elapsed 79%CPU (0avgtext+0avgdata 558768maxresident)k
0inputs+0outputs (0major+144125minor)pagefaults 0swaps
/usr/bin/time python3 bench-read-pop.py 17.02s user 9.39s system 79% cpu 33.253 total
I stumbled on a case in which I have a big number of scattered (i.e., generally non-consecutive) edge ids. If I wrap the selection in
libsonata.Selection
, the call forEdgePopulation.get_attribute(...)
takes a significant amount of time.For example, if I
the performance is significantly worse than doing:
There's a test case to demonstrate this effect in:
It can be run with the
run.sh
. Example of the run output:For smaller circuits, I guess doing
select_all
is not an issue, but for bigger ones, there might be concerns such as memory usage.