Closed vncntprvst closed 5 months ago
It worked for me, although it took a lot longer than I expected to load. I am going to investigate why this is taking so long.
I think your error was just a network failure. (I suppose we'll want to implement retries)
The .lindi.json file is itself around 80 MB, so it takes a bit of time to do the initial download.
Then there are a very large number of objects in the file. But I'm surprised that it takes so long for pynwb to load and process those. I'm taking a closer look...
And the units table I would expect to be very fast to load. Looking into it.
It worked for me as well. The initial file open & read took about 30 seconds. The trials dataframe was fast. The units dataframe took another ~1.5 min.
When developing PyNWB/HDMF, we did not try to minimize the number of reads, especially when converting DynamicTable objects to pandas DataFrames, so there are likely to be inefficiencies there.
Regarding the units table... I think to_dataframe()
might not make a lot of sense in this context because maybe it is trying to put all the spike times in there? Not sure. But I think that may be why it takes so long. But I think the actual loading of data using lindi should be efficient.
maybe it is trying to put all the spike times in there
Yeah, all data in the table is read immediately (as opposed to lazily) when converting to a pandas DataFrame
When developing PyNWB/HDMF, we did not try to minimize the number of reads, especially when converting DynamicTable
One specific example is reading of spike_times
from the units
table, or more broadly, reading of ragged array columns where values in VectorData
are read via a VectorIndex
. Here is the related issue on hdmf_zarr
that describes this specific problem in more detail: https://github.com/hdmf-dev/hdmf-zarr/issues/141 as well as a corresponding issue on the nwb_benchmarks
to add this to our test suite https://github.com/NeurodataWithoutBorders/nwb_benchmarks/issues/13
Thanks for all the feedback, and for developing this tool. I admittedly did not spend much time trying to debug this, I'm on a deadline... I'll definitely use it in my projects, it's really useful.
Hi, I'm testing lindi, following this discussion. This is the code I'm running:
It worked up to (and including)
trials_df = nwbfile.trials.to_dataframe()
. However, atunits_df = nwbfile.units.to_dataframe()
, I got this error:I haven't tested that on other assets, so I'm not sure if that issue is specific or not.