Closed cboulay closed 1 month ago
I may have jumped to a conclusion about the error. Even if I try manual timestamps with np.arange(test_data.shape[0])
, I still get the same error. It's the data that is missing the maxshape. We only see the error when timestamps are provided because the TimeSeries
__init__
call to self._check_time_series_dimension()
only looks for maxshape if timestamps is not None.
But, I did provide the maxshape
argument to my H5DataIO object... so where did it go? It goes into the object's io_settings
, but the H5DataIO itself does not have a maxshape
attribute or property.
I added the following to H5DataIO
in hdmf/backends/hdf5/h5_utils.py
@property
def maxshape(self):
return self.io_settings["maxshape"]
And with that, my test code works.
Edit: the examples in #1011 worked because DataChunkIterator
has a getter for maxshape
. So I'm guessing the "alternative approach" in the docs never worked with manual timestamps because of this maxshape problem.
Hi @cboulay, thanks for the detailed code example and bug description!
I think using the "alternative approach" to create the H5DataIO
objects for data and timestamps seems reasonable for your use case.
Your fix to add a maxshape
property to the H5DataIO
object seems like a good way to address this issue. I believe this issue should mostly be on the hdmf side, so thank you for already opening the relevant issue there. Would you be interested in filing a pull request in the hdmf repository to fix this bug?
upstream PR merged
What happened?
I am streaming multiple unbounded data streams to NWB (i.e., recording live data), with one stream->dataset per file; I intend to combine files after recording. Most of the documentation around iterative writing outlines how to wrap around a DataChunkIterator around a generator but I find this cumbersome for unbounded data because I need to put the generator in another thread that pulls from a queue, and use the main thread to supply data to the queue.
The alternative approach is more flexible and seems to fit my use case better. However, the alternative approach does not work for timestamps! It seems the DataChunkIterator does work for timestamps.
Unfortunately, I don't fully trust the reported sample rate coming from my data sources so I would prefer to store the timestamps over setting the start time and srate only.
If using a
DataChunkIterator
is preferable to usingH5DataIO
then are there any other patterns I should try? Or do I bite the bullet and use the multi-threaded approach?Steps to Reproduce
Traceback
Operating System
macOS
Python Executable
Python
Python Version
3.9
Package Versions
env.txt
Code of Conduct