NeurodataWithoutBorders / pynwb

A Python API for working with Neurodata stored in the NWB Format
https://pynwb.readthedocs.io
Other
179 stars 84 forks source link

[Bug]: HDFView 3.3.0 shows incorrect references in compound types #1744

Closed rly closed 1 year ago

rly commented 1 year ago

What happened?

This is not a bug in PyNWB, but a bug in HDFView. I am documenting it here because it impacts NWB users.

When using a TimeSeriesReferenceVectorData, like in the TimeIntervals neurodata_type, which is a dataset with a compound dtype consisting of (int, int, object reference to a ``TimeSeries`` object), displaying this data in HDFView is incorrect. All the references are to the same first object.

For demonstration, I created such a file with a TimeSeriesReferenceVectorData with two rows that point to two different TimeSeries objects. In the below screenshot, on the left, you see the HDFView data viewer for this dataset showing that the timeseries field of the compound dtype has the same object reference name even though they should be two different objects. When reading the data in PyNWB/h5py, on the right of the screenshot, the references are correctly different.

image

I haven't tested this with other compound dtypes in NWB or in HDF5.

Steps to Reproduce

# demonstration of writing a TimeSeriesReferenceVectorData object to a file
# for showing how HDFView does not display the object references correctly

from pynwb import NWBFile, NWBHDF5IO, validate, TimeSeries
from pynwb.base import TimeSeriesReferenceVectorData, DynamicTable
import datetime

nwbfile = NWBFile(
    session_description="session_description",
    identifier="identifier",
    session_start_time=datetime.datetime.now(datetime.timezone.utc),
)

ts1 = TimeSeries(
    name="test_timeseries1",
    data=[1, 2, 3, 4, 5],
    unit="m",
    rate=1.0
)

ts2 = TimeSeries(
    name="test_timeseries2",
    data=[1, 2, 3, 4, 5],
    unit="m",
    rate=1.0
)

tsref = TimeSeriesReferenceVectorData(
    name="test_timeseries_reference_vector_data",
    description="description",
    data=[(0, 1, ts1), (1, 2, ts2)],
)

dt = DynamicTable(
    name="test_dynamic_table",
    description="description",
    columns=[tsref],
)

nwbfile.add_acquisition(ts1)
nwbfile.add_acquisition(ts2)
nwbfile.add_acquisition(dt)
filename = "test_tsref.nwb"

with NWBHDF5IO(filename, "w") as io:
    io.write(nwbfile)

with NWBHDF5IO(filename, "r") as io:
    errors = validate(io)
    print("errors:", errors)
    nwbfile = io.read()
    print(nwbfile)
    print(nwbfile.acquisition["test_dynamic_table"]["test_timeseries_reference_vector_data"].data[:])

Traceback

No response

Operating System

macOS

Python Executable

Conda

Python Version

3.11

Package Versions

No response

Code of Conduct

oruebel commented 1 year ago

I would recommend creating an issue on https://github.com/HDFGroup/hdfview and link to this one

rly commented 1 year ago

I created an issue there https://github.com/HDFGroup/hdfview/issues/147