[Bug]: Saving electrodes information makes io writing/reading nwb file to crash in notebooks

etiennedemontalivet commented 6 months ago

What happened?

When I save electrodes information in NWB file (using Ecephys tutorial) and that I then write/read the NWB file using basic IO (writing, reading), the NWB file is displayed correctly with the in-RAM NWB file (before saving) but crashes with the loaded NWB file (after writing/reading back).

I would expect the NWB file that has been read to display like this:

Steps to Reproduce

from datetime import datetime
from uuid import uuid4

import numpy as np
from dateutil.tz import tzlocal
from pynwb import NWBHDF5IO, NWBFile
from pynwb.ecephys import LFP, ElectricalSeries

nwbfile = NWBFile(
    session_description="my first synthetic recording",
    identifier=str(uuid4()),
    session_start_time=datetime.now(tzlocal()),
    experimenter=[
        "Baggins, Bilbo",
    ],
    lab="Bag End Laboratory",
    institution="University of Middle Earth at the Shire",
    experiment_description="I went on an adventure to reclaim vast treasures.",
    session_id="LONELYMTN001",
)

device = nwbfile.create_device(
    name="array", description="the best array", manufacturer="Probe Company 9000"
)

nwbfile.add_electrode_column(name="label", description="label of electrode")

nshanks = 4
nchannels_per_shank = 3
electrode_counter = 0

for ishank in range(nshanks):
    # create an electrode group for this shank
    electrode_group = nwbfile.create_electrode_group(
        name="shank{}".format(ishank),
        description="electrode group for shank {}".format(ishank),
        device=device,
        location="brain area",
    )
    # add electrodes to the electrode table
    for ielec in range(nchannels_per_shank):
        nwbfile.add_electrode(
            group=electrode_group,
            label="shank{}elec{}".format(ishank, ielec),
            location="brain area",
        )
        electrode_counter += 1

all_table_region = nwbfile.create_electrode_table_region(
    region=list(range(electrode_counter)),  # reference row indices 0 to N-1
    description="all electrodes",
)
display(nwbfile)

io = NWBHDF5IO("basics_tutorial.nwb", mode="w")
io.write(nwbfile)
io.close()

with NWBHDF5IO("basics_tutorial.nwb", "r") as io:
    read_nwbfile = io.read()

display(read_nwbfile)

Traceback

{
    "name": "ValueError",
    "message": "Dset_id is not a dataset id (dset_id is not a dataset ID)",
    "stack": "---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
File c:\\Users\\etienne\\miniconda3\\envs\\syn_decoder\\lib\\site-packages\\IPython\\core\\formatters.py:344, in BaseFormatter.__call__(self, obj)
    342     method = get_real_method(obj, self.print_method)
    343     if method is not None:
--> 344         return method()
    345     return None
    346 else:

File c:\\Users\\etienne\\miniconda3\\envs\\syn_decoder\\lib\\site-packages\\hdmf\\container.py:618, in Container._repr_html_(self)
    616 html_repr += \"<div class='container-wrap'>\"
    617 html_repr += f\"<div class='container-header'><div class='xr-obj-type'><h3>{header_text}</h3></div></div>\"
--> 618 html_repr += self._generate_html_repr(self.fields, is_field=True)
    619 html_repr += \"</div>\"
    620 return html_repr

File c:\\Users\\etienne\\miniconda3\\envs\\syn_decoder\\lib\\site-packages\\hdmf\\container.py:629, in Container._generate_html_repr(self, fields, level, access_code, is_field)
    627     for key, value in fields.items():
    628         current_access_code = f\"{access_code}.{key}\" if is_field else f\"{access_code}['{key}']\"
--> 629         html_repr += self._generate_field_html(key, value, level, current_access_code)
    630 elif isinstance(fields, list):
    631     for index, item in enumerate(fields):

File c:\\Users\\etienne\\miniconda3\\envs\\syn_decoder\\lib\\site-packages\\hdmf\\container.py:649, in Container._generate_field_html(self, key, value, level, access_code)
    645     return f'<div style=\"margin-left: {level * 20}px;\" class=\"container-fields\"><span class=\"field-key\"' \\
    646            f' title=\"{access_code}\">{key}: </span><span class=\"field-value\">{value}</span></div>'
    648 if hasattr(value, \"generate_html_repr\"):
--> 649     html_content = value.generate_html_repr(level + 1, access_code)
    651 elif hasattr(value, '__repr_html__'):
    652     html_content = value.__repr_html__()

File c:\\Users\\etienne\\miniconda3\\envs\\syn_decoder\\lib\\site-packages\\hdmf\\common\\table.py:1225, in DynamicTable.generate_html_repr(self, level, access_code, nrows)
   1222     if key not in (\"id\", \"colnames\", \"columns\"):
   1223         out += self._generate_field_html(key, value, level, access_code)
-> 1225 inside = f\"{self[:min(nrows, len(self))].to_html()}\"
   1227 if len(self) == nrows + 1:
   1228     inside += \"<p>... and 1 more row.</p>\"

File c:\\Users\\etienne\\miniconda3\\envs\\syn_decoder\\lib\\site-packages\\hdmf\\common\\table.py:947, in DynamicTable.__getitem__(self, key)
    946 def __getitem__(self, key):
--> 947     ret = self.get(key)
    948     if ret is None:
    949         raise KeyError(key)

File c:\\Users\\etienne\\miniconda3\\envs\\syn_decoder\\lib\\site-packages\\hdmf\\common\\table.py:1005, in DynamicTable.get(self, key, default, df, index, **kwargs)
   1001         return default
   1002 else:
   1003     # index by int, list, np.ndarray, or slice -->
   1004     # return pandas Dataframe or lists consisting of one or more rows
-> 1005     sel = self.__get_selection_as_dict(key, df, index, **kwargs)
   1006     if df:
   1007         # reformat objects to fit into a pandas DataFrame
   1008         if np.isscalar(key):

File c:\\Users\\etienne\\miniconda3\\envs\\syn_decoder\\lib\\site-packages\\hdmf\\common\\table.py:1056, in DynamicTable.__get_selection_as_dict(self, arg, df, index, exclude, **kwargs)
   1054         raise IndexError(msg) from ve
   1055     else:  # pragma: no cover
-> 1056         raise ve
   1057 except IndexError as ie:
   1058     x = re.match(r\"^Index \\((.*)\\) out of range for \\(.*\\)$\", str(ie))

File c:\\Users\\etienne\\miniconda3\\envs\\syn_decoder\\lib\\site-packages\\hdmf\\common\\table.py:1033, in DynamicTable.__get_selection_as_dict(self, arg, df, index, exclude, **kwargs)
   1030 ret = OrderedDict()
   1031 try:
   1032     # index with a python slice or single int to select one or multiple rows
-> 1033     ret['id'] = self.id[arg]
   1034     for name in self.colnames:
   1035         if name in exclude:

File c:\\Users\\etienne\\miniconda3\\envs\\syn_decoder\\lib\\site-packages\\hdmf\\container.py:858, in Data.__getitem__(self, args)
    857 def __getitem__(self, args):
--> 858     return self.get(args)

File c:\\Users\\etienne\\miniconda3\\envs\\syn_decoder\\lib\\site-packages\\hdmf\\container.py:866, in Data.get(self, args)
    863 if isinstance(self.data, h5py.Dataset) and isinstance(args, np.ndarray):
    864     # This is needed for h5py 2.9 compatibility
    865     args = args.tolist()
--> 866 return self.data[args]

File h5py\\_objects.pyx:54, in h5py._objects.with_phil.wrapper()

File h5py\\_objects.pyx:55, in h5py._objects.with_phil.wrapper()

File c:\\Users\\etienne\\miniconda3\\envs\\syn_decoder\\lib\\site-packages\\h5py\\_hl\\dataset.py:741, in Dataset.__getitem__(self, args, new_dtype)
    739 if self._fast_read_ok and (new_dtype is None):
    740     try:
--> 741         return self._fast_reader.read(args)
    742     except TypeError:
    743         pass  # Fall back to Python read pathway below

File h5py\\_selector.pyx:370, in h5py._selector.Reader.read()

ValueError: Dset_id is not a dataset id (dset_id is not a dataset ID)"
}

Operating System

Windows

Python Executable

Python

Python Version

3.9

Package Versions

pyproject_issue.txt

Code of Conduct

[X] I agree to follow this project's Code of Conduct
[X] Have you checked the Contributing document?
[X] Have you ensured this bug was not already reported?

oruebel commented 6 months ago

Which version of HDMF and PyNWB are you using? Based on the error I suspect the issue is that the file is closed, i.e., NWBHDF5IO is being created in the context and then display(read_nwbfile) is being called outside of the context, at which point NWBHDF5IO no longer exists. I.e., I suspect if you move the display(read_nwbfile) inside of the context, i.e., if you replace:

with NWBHDF5IO("basics_tutorial.nwb", "r") as io:
    read_nwbfile = io.read()

display(read_nwbfile)

with

with NWBHDF5IO("basics_tutorial.nwb", "r") as io:
     read_nwbfile = io.read()
     display(read_nwbfile)

that the error will probably go away. It would be great if you could test if that fixes the problem.

I believe this issue was addressed in https://github.com/hdmf-dev/hdmf/pull/882, which is why I was asking about which versions of PyNWB and HDMF you have installed.

etiennedemontalivet commented 6 months ago

Thanks! That fixes the problem 👍 PS: I'm using

pynwb: 2.5.0
hdmf: 3.12.1

oruebel commented 6 months ago

Glad that fixed the issue.

Thanks for adding the package versions. Looking at the release notes, I think this issue should already be fixed in those versions. @stephprince could you run the above example in a Jupyter notebook to see if you see the same issue, just to make sure this is really fixed.

stephprince commented 6 months ago

I tested in a notebook and see the same issue.

Everything works ok when I make a function that returns io.read() like described in the hdmf issue you linked:

def read_nwb(fname):
    io = NWBHDF5IO(fname, mode='r')
    return io.read()
nwbfile = read_nwb("basics_tutorial.nwb")
display(nwbfile) # looks normal

But I get the same ValueError: Dset_id is not a dataset id (dset_id is not a dataset ID) as above when I run the code below. However, I'm still able to access all the fields except for the electrodes table data.

with NWBHDF5IO("basics_tutorial.nwb", "r") as io:
    nwbfile = io.read()
display(nwbfile) # error

rly commented 6 months ago

This same issue came up with someone else I was working with. Before the new awesome expandable html representation of an HDMF container, printing an HDMF container (e.g., NWB file) in a Jupyter notebook after the IO object has been closed did not cause an error. Now it does. Printing a container in such a situation is not recommended because the file is closed and datasets are not accessible. However, it is a common mistake to try to print the container outside of a context manager. I suggest we improve the messaging around this.

Option 1: We get rid of the error if possible. I'm not sure why a dataset is being accessed such that we get a ValueError. I think it has to do with printing DynamicTable objects but I have not looked into this.

Option 2: If the IO object is closed, we change the print behavior so that it simply says <Closed [container name]>, e.g., <Closed NWBFile>, just like h5py prints <Closed HDF5 file> or <Closed HDF5 group> if the file or group has already been closed. This might catch all issues with closed IO objects, but at the same time, the current behavior where users can still access the already-read non-dataset metadata in the container after the IO object is closed is kind of nice.

oruebel commented 6 months ago

after the IO object has been closed did not cause an error.

We should check what the context manager is doing. https://github.com/hdmf-dev/hdmf/pull/882 added the functionality to have the io object stored on the Container so that it can remain open. This change was to allow us to return the Container from a function without having to also return the io object explicitly. However, I think this same approach should also work here, as long as we have a way to make sure that NWBHDF5IO does not call close when leaving the context. In particular in read mode this should should be save since the file will be closed when the last Container gets deleted.

NeurodataWithoutBorders / pynwb