VectorData for TimeSeries `data`

luiztauffer commented 5 years ago

I am trying to make a NWBGroupSpec that would extend TimeSeries, the main thing being it would accept indexed data with VectorData and VectorIndex types. Here’s the repo for reference. I would like it to override the data field, so I do:

PointCloudSeries = NWBGroupSpec(
        doc='type for storing time-varying 3D point clouds',
        neurodata_type_def='PointCloudSeries',
        neurodata_type_inc='TimeSeries',
)

PointCloudSeries.add_dataset(
        name='data',
        neurodata_type_inc='VectorData',
        doc='datapoints locations over time',
        dims=('time', '[x, y, z]'),
        shape=(None, 3),
        dtype='float',
        quantity='?'
)

The new group can be imported but when when I try to set:

from datetime import datetime
from pynwb import NWBFile
from ndx_pointcloudseries import PointCloudSeries
from hdmf.common.table import VectorIndex, VectorData

nwb = NWBFile('session_description', 'identifier', datetime.now().astimezone())

data = [[1., 1., 1.], [2., 2., 2.], [1., 2., 1.]]
data_vect = VectorData(name='data', description='desc', data=data)
indexes = [2, 3]
data_ind = VectorIndex(name='data_index', data=indexes, target=data_vect)

pcs = PointCloudSeries(
        name='PointCloudSeries',
        data=data_vect,
        data_index=data_ind,
        rate=10.
)

I get:

TypeError                                 Traceback (most recent call last)
<ipython-input-4-4b65ab8f601c> in <module>
     16         data=data_vect,
     17         data_index=data_ind,
---> 18         rate=10.
     19     )
     20 nwb.add_acquisition(pcs)

~\Anaconda3\envs\nwbn_conversion\lib\site-packages\hdmf\utils.py in func_call(*args, **kwargs)
    457                     if parse_err:
    458                         msg = ', '.join(parse_err)
--> 459                         raise_from(ExceptionType(msg), None)
    460 
    461                 return func(self, **parsed['args'])

~\AppData\Roaming\Python\Python37\site-packages\six.py in raise_from(value, from_value)

TypeError: incorrect type for 'data' (got 'VectorData', expected 'ndarray, list, tuple, Dataset, HDMFDataset, AbstractDataChunkIterator, DataIO or TimeSeries')

The number of datapoints change in time, so it needs to be indexed, and we also wanted to leverage the methods for time slicing that would come with TimeSeries. So, I have some questions:

Is it possible to override existing fields when inheriting from an existing group?
If it is possible, would time slicing methods for TimeSeries work with indexed data?
Or else what would you suggest me to do? Maybe use a DynamicTable?

Thanks!

Checklist

[x] Have you ensured the feature or change was not already reported?
[x] Have you included a brief and descriptive title?
[x] Have you included a clear description of the problem you are trying to solve?
[x] Have you included a minimal code snippet that reproduces the issue you are encountering?
[x] Have you checked our Contributing document?

oruebel commented 5 years ago

Or else what would you suggest me to do? Maybe use a DynamicTable?

As a first test, what I would suggest to try is to add VectorData as an allowed type for the data attribute of the constructor of TimeSeries here:

https://github.com/NeurodataWithoutBorders/pynwb/blob/eeef0eb8ff4119b1a69bd172db1e98827299b278/src/pynwb/base.py#L104

If this works, then this would at least tell us that the read/write can work in principle. I'm not sure about other functionality of TimeSeries, but let's take this issue step-by-step.

Is it possible to override existing fields when inheriting from an existing group?

It is possible to refine the spec of existing fields but not overwrite. For example, it is possible to change the dtype on a dataset but you can't change the neurodata_type (only reuse or create new neurodata_types). In this particular case you are adding a neurodata_type to TimeSeries.data , which did not have type before. This is a corner-case that I don't think we have encountered before. I'm not sure if this allowed or not. @ajtritt do you know?

If it is possible, would time slicing methods for TimeSeries work with indexed data?

I'm not sure this would work right out of the box. E.g., in your case you set data to the VectorData object but you would actually need to slice against the VectorIndex dataset for time slicing. I would imaging, that at you would probably need to the TimeSeries.data to the VectorIndex and make sure that this is handled in the ObjectMapper.

luiztauffer commented 5 years ago

thanks for the explanation @oruebel ! I added VectorData as an allowed type as suggested on line 104 and now I can construct the PointCloudSeries object:

pcs = PointCloudSeries(
        name='PointCloudSeries',
        data=data_vect,
        data_index=data_ind,
        rate=10.
    )
nwb.add_acquisition(pcs)
print(nwb.acquisition['PointCloudSeries'])

gives:

PointCloudSeries abc.PointCloudSeries at 0x1661569459016
Fields:
  comments: no comments
  conversion: 1.0
  data: data <class 'hdmf.common.table.VectorData'>
  data_index: data_index <class 'hdmf.common.table.VectorIndex'>
  description: no description
  rate: 10.0
  resolution: -1.0
  starting_time: 0.0

now the error happens when trying to write to file:

with NWBHDF5IO('test.nwb', 'w') as io:
    io.write(nwb)

gives:

C:\Users\Luiz\Anaconda3\envs\nwbn_conversion\lib\site-packages\hdmf\build\map.py:1041: OrphanContainerWarning: 'data' (VectorData) for 'PointCloudSeries' (PointCloudSeries)
  warnings.warn(msg, OrphanContainerWarning)
---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
~\Anaconda3\envs\nwbn_conversion\lib\site-packages\hdmf\backends\hdf5\h5tools.py in __add_refs(self)
    528             try:
--> 529                 call()
    530             except KeyError:

~\Anaconda3\envs\nwbn_conversion\lib\site-packages\hdmf\backends\hdf5\h5tools.py in _filler()
    646             def _filler():
--> 647                 obj.attrs[key] = self.__get_ref(value)
    648         return _filler

~\Anaconda3\envs\nwbn_conversion\lib\site-packages\hdmf\utils.py in func_call(*args, **kwargs)
    460 
--> 461                 return func(self, **parsed['args'])
    462         else:

~\Anaconda3\envs\nwbn_conversion\lib\site-packages\hdmf\backends\hdf5\h5tools.py in __get_ref(self, **kwargs)
   1076         else:
-> 1077             return self.__file[path].ref
   1078 

h5py\_objects.pyx in h5py._objects.with_phil.wrapper()

h5py\_objects.pyx in h5py._objects.with_phil.wrapper()

~\Anaconda3\envs\nwbn_conversion\lib\site-packages\h5py\_hl\group.py in __getitem__(self, name)
    263         else:
--> 264             oid = h5o.open(self.id, self._e(name), lapl=self._lapl)
    265 

h5py\_objects.pyx in h5py._objects.with_phil.wrapper()

h5py\_objects.pyx in h5py._objects.with_phil.wrapper()

h5py\h5o.pyx in h5py.h5o.open()

KeyError: "Unable to open object (object 'data' doesn't exist)"

During handling of the above exception, another exception occurred:

RuntimeError                              Traceback (most recent call last)
<ipython-input-2-a411a132fe37> in <module>
     23 # Write nwb file
     24 with NWBHDF5IO('test_pointcloudseries.nwb', 'w') as io:
---> 25     io.write(nwb)
     26 
     27 ## Read nwb file and check its content

~\Anaconda3\envs\nwbn_conversion\lib\site-packages\hdmf\utils.py in func_call(*args, **kwargs)
    459                         raise_from(ExceptionType(msg), None)
    460 
--> 461                 return func(self, **parsed['args'])
    462         else:
    463             def func_call(*args, **kwargs):

~\Anaconda3\envs\nwbn_conversion\lib\site-packages\hdmf\backends\hdf5\h5tools.py in write(self, **kwargs)
    267 
    268         cache_spec = popargs('cache_spec', kwargs)
--> 269         call_docval_func(super(HDF5IO, self).write, kwargs)
    270         if cache_spec:
    271             ref = self.__file.attrs.get(SPEC_LOC_ATTR)

~\Anaconda3\envs\nwbn_conversion\lib\site-packages\hdmf\utils.py in call_docval_func(func, kwargs)
    348 def call_docval_func(func, kwargs):
    349     fargs, fkwargs = fmt_docval_args(func, kwargs)
--> 350     return func(*fargs, **fkwargs)
    351 
    352 

~\Anaconda3\envs\nwbn_conversion\lib\site-packages\hdmf\utils.py in func_call(*args, **kwargs)
    459                         raise_from(ExceptionType(msg), None)
    460 
--> 461                 return func(self, **parsed['args'])
    462         else:
    463             def func_call(*args, **kwargs):

~\Anaconda3\envs\nwbn_conversion\lib\site-packages\hdmf\backends\io.py in write(self, **kwargs)
     42         container = popargs('container', kwargs)
     43         f_builder = self.__manager.build(container, source=self.__source)
---> 44         self.write_builder(f_builder, **kwargs)
     45 
     46     @abstractmethod

~\Anaconda3\envs\nwbn_conversion\lib\site-packages\hdmf\utils.py in func_call(*args, **kwargs)
    459                         raise_from(ExceptionType(msg), None)
    460 
--> 461                 return func(self, **parsed['args'])
    462         else:
    463             def func_call(*args, **kwargs):

~\Anaconda3\envs\nwbn_conversion\lib\site-packages\hdmf\backends\hdf5\h5tools.py in write_builder(self, **kwargs)
    511             self.write_link(self.__file, lbldr)
    512         self.set_attributes(self.__file, f_builder.attributes)
--> 513         self.__add_refs()
    514         self.__exhaust_dcis()
    515 

~\Anaconda3\envs\nwbn_conversion\lib\site-packages\hdmf\backends\hdf5\h5tools.py in __add_refs(self)
    530             except KeyError:
    531                 if id(call) in failed:
--> 532                     raise RuntimeError('Unable to resolve reference')
    533                 failed.add(id(call))
    534                 self.__ref_queue.append(call)

RuntimeError: Unable to resolve reference

Any ideas?

update the data in error KeyError: "Unable to open object (object 'data' doesn't exist)" is the name given to the VectorData object. If I change it to e.g. data_name, the error becomes: KeyError: "Unable to open object (object 'data_name' doesn't exist)". I couldn't figure out anything beyond that though =/

oruebel commented 4 years ago

I'm wondering whether there may be an issue with the ObjectMapping here, but I have not had the chance to dig deeper. @ajtritt do you have any idea?

ajtritt commented 4 years ago

I'm wondering whether there may be an issue with the ObjectMapping here

@oruebel you are correct. ObjectMapping is choking on having a Container passed as a concrete dataset.

The only way around this would be to change the TimeSeries.data spec to be a VectorData. I'm open to such a change, but it would probably be disruptive, so we should discuss that further.

Going the route of defining PointCloudSeries as a DynamicTable wold probably be easier though.

NeurodataWithoutBorders / pynwb

VectorData for TimeSeries `data` #1113

Checklist