Closed jorblancoa closed 1 year ago
Thanks @jorblancoa. I think the interface is really nice. I'm however not very happy with the convoluted path of the data, which I know it's not from this PR, but I would expect better from an "optimized library".
To recap we are converting from two columns in hdf5 to a vector of struct, then here again to columns. Besides, in get()
, there is an unconditional copy of all spikes for a filtering that is hardly requested. report_reader.cpp#L136
Maybe we should go back to requirements. Do we need the get() interface as is? is it really worth to have columnar data for data which is invariably two columns?
I'm however not very happy with the convoluted path of the data, which I know it's not from this PR, but I would expect better from an "optimized library".
I expect there needs to be a new c++ method that builds these separated vectors which is waht I was getting at here: https://github.com/BlueBrain/neurodamus/pull/8/files#discussion_r1273085740
Besides, in
get()
, there is an unconditional copy of all spikes for a filtering that is hardly requested. report_reader.cpp#L136
That copy should be fixed. The filtering, however, is important.
Maybe we should go back to requirements. Do we need the get() interface as is? is it really worth to have columnar data for data which is invariably two columns?
On the C++ side, I think it's fine; keeping that API stable is important. On the python side, a list of tuples is not great, but API stability is more important.
Once we have the new python methods in place, we can start issuing warnings
that people may get better performance from switching to it. Then, perhaps, we can deprecate it in the future.
Shall we merge this in order to create a tag together with the LFP changes? @mgeplf
I'm however not very happy with the convoluted path of the data, which I know it's not from this PR, but I would expect better from an "optimized library".
I think @ferdonline's point of " I'm however not very happy with the convoluted path of the data, which I know it's not from this PR, but I would expect better from an "optimized library"." still stands. I think we should do it with a less convoluted data path before it gets committed.
Does that belong to this PR?
I would say so; the extra array created by https://github.com/BlueBrain/libsonata/pull/285/files#diff-cc07100b7c7235ddf263fda0d515a7b40ed1fe5b871714e0ab5c1ec5298ddb0dR1193 can be avoided if some internal methods were added to get the id and timestamp separately
I would say so; the extra array created by https://github.com/BlueBrain/libsonata/pull/285/files#diff-cc07100b7c7235ddf263fda0d515a7b40ed1fe5b871714e0ab5c1ec5298ddb0dR1193 can be avoided if some internal methods were added to get the id and timestamp separately
Makes sense, but then the best would be to have them as members in the reportreader.h right? Otherwise the spikes need to be used anyway to filter and then converted to the 2 arrays.
Makes sense, but then the best would be to have them as members in the report_reader.h right?
Yeah, I think I understand what you mean and that would make sense.
Since we cant reuse the existing code for filtering and fernando's usecase only needs the 2 raw arrays, would you be ok for that? @ferdonline @mgeplf
Don't we still have the convoluted path of the data
which was of the main things that needed to be improved?
It looks like getArrays
calls get
, which makes an unconditional copy of all spikes by createSpikes
creating pairs from the underlying node_id and timestamps.
Back in these pairs are destructured into two new arrays. Can’t we create arrays of only the data that is required (ie: based on the gids/times), so they aren’t large, and create them in the format that is required? Having an Iterator would probably make that easier.
@mgeplf IIUC at line 162 there's the fast path. If the client wants to filter data then the copy happens to vector<pair>
which IMO is fair since that layout is ideal for sorting together. We could potentially further optimize the case of only filtering without ordering and avoid the copy, but I don't think we have that use case now, so I'd agree to leave as is and touch at a later stage.
But we want to be able to filter things; that’s an important use case. It seems weird to me to have users of a library have to know that there is a fast-path
if the client wants to sort the data rather than doing the right thing on the library side.
On the neurodamus side, isn’t it useful to only load spikes that are going to be used in the simulation rather than loading all of them? For instance, if the simulation is going to run for 1s, but the spikes file contains 10s worth of spikes? What about the case where the simulation is only doing a subset of node_ids? Wouldn’t filtering the data before it all gets loaded be valuable? These are the sorts of optimizations that can be done here, and then the would benefit everyone using the library.
They’re also useful right now, as when people do analysis, a subset of the data is usually looked at.
Don't we still have the
convoluted path of the data
which was of the main things that needed to be improved? It looks likegetArrays
callsget
, which makes an unconditional copy of all spikes bycreateSpikes
creating pairs from the underlying node_id and timestamps. Back in these pairs are destructured into two new arrays. Can’t we create arrays of only the data that is required (ie: based on the gids/times), so they aren’t large, and create them in the format that is required? Having an Iterator would probably make that easier.
It makes sense, I wanted to reuse code but is true in case of calling getArrays it doesnt make sense to call get() to make the pairs only for the filtering. I pushed some changes to filter directly in the getArrays() method.
On the neurodamus side, isn’t it useful to only load spikes that are going to be used in the simulation rather
In neurodamus we can't always filter ahead of time because of setup like coreneuron+save-restore, so it was ok. But I get it that maybe for other uses we want filters more often.
Comparing the get_dict
to the get()
method, it seems like there is a performance regression when getting a subset of ids:
def read_all_subset_nodes():
sp = sr['All']
node_ids = libsonata.Selection(ids)
spikes = sp.get(node_ids)
def read_all_subset_nodes_structure():
sp = sr['All']
node_ids = libsonata.Selection(ids)
spikes = sp.get_dict(node_ids)
print("read_all_subset_nodes")
timeit(read_all_subset_nodes)
print("read_all_subset_nodes_structure")
timeit(read_all_subset_nodes_structure)
Gives:
************* spikes-decimated-unsorted.h5
read_all_subset_nodes
1 loop, best of 5: 130.22718537412584 per loop
read_all_subset_nodes_structure
1 loop, best of 5: 972.7277841931209 per loop
************* spikes-decimated-sorted-by-time.h5
read_all_subset_nodes
1 loop, best of 5: 130.0559630645439 per loop
read_all_subset_nodes_structure
1 loop, best of 5: 972.6187489759177 per loop
************* spikes-decimated-sorted-by-ids.h5
read_all_subset_nodes
1 loop, best of 5: 96.04782155901194 per loop
read_all_subset_nodes_structure
1 loop, best of 5: 972.36246633064 per loop
I'm not a big fan of different access mechanisms having much different execution profiles, because that means library consumers have to benchmark everything to know what to use - which isn't very ergonomic.
In this case, we'll have to fix it later.
Comparing the get_dict to the get() method, it seems like there is a performance regression when getting a subset of ids:
I should also add that the fast path (ie: get_dict()
) is much faster: (new vs old): 0.36195594631135464 vs 9.994007629342377, so it's very worthwhile.
Nice, fixing the regression makes a big difference:
************* spikes-decimated-unsorted.h5
read_all_subset_nodes
1 loop, best of 5: 131.17122020898387 per loop
read_all_subset_nodes_structure
1 loop, best of 5: 129.58019144897116 per loop
************* spikes-decimated-sorted-by-time.h5
read_all_subset_nodes
1 loop, best of 5: 131.06627149000997 per loop
read_all_subset_nodes_structure
1 loop, best of 5: 129.91921105398796 per loop
************* spikes-decimated-sorted-by-ids.h5
read_all_subset_nodes
1 loop, best of 5: 96.16886089701438 per loop
read_all_subset_nodes_structure
1 loop, best of 5: 129.54882664396428 per loop
Thanks @jorblancoa!
Modify Spike object from std::pair to struct
Following the discussions in [BBPBGLIB-1044]