BlueBrain / libsonata

A python and C++ interface to the SONATA format
https://libsonata.readthedocs.io/en/stable/
GNU Lesser General Public License v3.0
12 stars 12 forks source link

EdgePopulation.get_attribute with significant number of non-consecutive edge ids #225

Open joni-herttuainen opened 2 years ago

joni-herttuainen commented 2 years ago

I stumbled on a case in which I have a big number of scattered (i.e., generally non-consecutive) edge ids. If I wrap the selection in libsonata.Selection, the call for EdgePopulation.get_attribute(...) takes a significant amount of time.

For example, if I

edge_pop.get_attribute("afferent_section_id", libsonata.Selection(edge_ids))

the performance is significantly worse than doing:

edge_pop.get_attribute("afferent_section_id", edge_pop.select_all())[edge_ids]

There's a test case to demonstrate this effect in:

/gpfs/bbp.cscs.ch/project/proj30/home/herttuai/libsonata_sel_test/

It can be run with the run.sh. Example of the run output:

$ ./run_test.sh 
Autoloading python/3.9.7
Autoloading hpe-mpi/2.25.hmpt

Running test...
number of total edge_ids 267023294
number of wanted edge ids: 2244619
length of selection ranges: 2231745

Duration, 'select_all and take edge_ids': 0.31s
Duration, 'Selection(edge_ids)': 30.92s

For smaller circuits, I guess doing select_all is not an issue, but for bigger ones, there might be concerns such as memory usage.

mgeplf commented 2 years ago

Thank you for quantifying this - I've always assumed it's a problem, but never got around to looking at it.

Dispatching many small reads vs coalescing them makes a huge difference, I will have to give that a try sometime.

mgeplf commented 2 years ago

I couldn't help but try: https://github.com/BlueBrain/libsonata/pull/226

matz-e commented 2 years ago

I think this could need the same treatment as done for the report API: close consecutive ids/elements could be merged into chunks that are used to aggregate reads and not hit the file system as much. Noting this as it may be the underlying issue we see in SPIND-235.

mgeplf commented 2 years ago

I should have closed this with https://github.com/BlueBrain/libsonata/pull/226, but we can reuse it for this.

I'm wondering if we could collapse the new logic for read aggregation the edge population reading w/ the one that's in the report reading. Have you looked at that @matz-e? If not, I can have a gander at some point soon.

mgeplf commented 2 years ago

did a quick and dirty version here: https://github.com/BlueBrain/libsonata/tree/try-chunked-pop-read

can you see if that helps w/ your use case @matz-e?

it doesn't work exactly how I'd want it to yet, but it seems faster-ish.

matz-e commented 2 years ago

Thanks, @mgeplf! I'll try to test that soon™

mgeplf commented 2 years ago

It's pretty hacky, but at least it gives an idea of how things could be better.

if one sets GAP, one can change the max chunk size.

A quick scan of values gives 1e6 to be a sweet spot:

(venv39) gevaert@r2i1n34 ~/private/src/libsonata (try-chunked-pop-read *)$ for GAP in 32 2048 65536 1048576 16777216 536870912 ; do export GAP; echo $GAP; /usr/bin/time python3 bench-read-pop.py; done                            [401/9714]
32
18.18user 8.88system 0:30.46elapsed 88%CPU (0avgtext+0avgdata 107316maxresident)k
0inputs+0outputs (0major+23543minor)pagefaults 0swaps
/usr/bin/time python3 bench-read-pop.py  18.19s user 8.89s system 88% cpu 30.465 total
2048
12.88user 8.76system 0:26.28elapsed 82%CPU (0avgtext+0avgdata 107312maxresident)k
0inputs+0outputs (0major+23548minor)pagefaults 0swaps
/usr/bin/time python3 bench-read-pop.py  12.88s user 8.77s system 82% cpu 26.286 total
65536
8.87user 8.91system 0:24.26elapsed 73%CPU (0avgtext+0avgdata 107324maxresident)k
0inputs+0outputs (0major+23546minor)pagefaults 0swaps
/usr/bin/time python3 bench-read-pop.py  8.87s user 8.92s system 73% cpu 24.271 total
1048576
8.22user 8.55system 0:23.31elapsed 71%CPU (0avgtext+0avgdata 107320maxresident)k
0inputs+0outputs (0major+23544minor)pagefaults 0swaps
/usr/bin/time python3 bench-read-pop.py  8.23s user 8.56s system 71% cpu 23.314 total
16777216
10.57user 8.96system 0:25.92elapsed 75%CPU (0avgtext+0avgdata 134480maxresident)k
0inputs+0outputs (0major+30459minor)pagefaults 0swaps
/usr/bin/time python3 bench-read-pop.py  10.57s user 8.97s system 75% cpu 25.932 total
536870912
17.01user 9.39system 0:33.24elapsed 79%CPU (0avgtext+0avgdata 558768maxresident)k
0inputs+0outputs (0major+144125minor)pagefaults 0swaps
/usr/bin/time python3 bench-read-pop.py  17.02s user 9.39s system 79% cpu 33.253 total