NeurodataWithoutBorders / pynwb

A Python API for working with Neurodata stored in the NWB Format
https://pynwb.readthedocs.io
Other
177 stars 84 forks source link

[Bug]: 2-photon series movie not appearing in file. #1826

Closed rcpeene closed 10 months ago

rcpeene commented 10 months ago

What happened?

We have packaged our NWB files and added 2-photon movies to the files. This is evidence by the fact that the file grows in size significantly, and that the movie can be seen when exploring the file in .h5 format. However, when opening with PyNWB, the object does not appear in nwb.acquisition.

Steps to Reproduce

- package a file using the following code snippet to insert a 2-photon series

ts = TwoPhotonSeries(
                name='raw_suite2p_motion_corrected',
                imaging_plane=plane,
                data=wrapped_data,
                format='raw',
                unit='SIunit',
                rate=10.71
            )
        input_nwb.add_acquisition(ts)
        io.write(input_nwb)

- open the file with PyNWB
- run `print(nwb.acquisition.keys())`

Traceback

No traceback

Operating System

Windows

Python Executable

Python

Python Version

3.9

Package Versions

accessible-pygments==0.0.4 aiohttp==3.7.4 aiosignal==1.3.1 alabaster==0.7.12 anyio==3.6.2 appdirs==1.4.4 argon2-cffi==21.3.0 argon2-cffi-bindings==21.2.0 argschema==2.0.2 arrow==1.2.3 asciitree==0.3.3 asttokens==2.2.0 async-timeout==3.0.1 attrs==21.4.0 Babel==2.10.3 backcall==0.2.0 bcrypt==4.0.1 beautifulsoup4==4.11.1 bg-atlasapi==1.0.2 bg-space==0.6.0 bidsschematools==0.7.1 bleach==5.0.1 boto3==1.28.10 botocore==1.31.10 bqplot==0.12.36 brainrender==2.0.5.5 bs4==0.0.1 cachetools==4.2.4 ccfwidget==0.5.3 cebra==0.2.0 cellpose==2.2.2 certifi==2022.9.24 cffi==1.15.1 chardet==3.0.4 charset-normalizer==2.1.1 ci-info==0.3.0 click==8.1.3 click-didyoumean==0.3.0 cloudpickle==2.2.0 colorama==0.4.6 colorcet==3.0.1 commonmark==0.9.1 contourpy==1.0.6 coverage==7.2.1 cryptography==41.0.3 cycler==0.11.0 dandi==0.55.1 dandischema==0.8.4 dask==2022.11.1 -e git+https://github.com/AllenInstitute/openscope_databook.git@1739f7510547142849b480093ad6f2789b8045c2#egg=databook_utils debugpy==1.6.4 decorator==5.1.1 defusedxml==0.7.1 Deprecated==1.2.13 distro==1.8.0 dnspython==2.2.1 docutils==0.17.1 elephant==0.12.0 email-validator==1.3.0 entrypoints==0.4 etelemetry==0.3.0 exceptiongroup==1.1.0 execnet==1.9.0 executing==1.2.0 fasteners==0.18 fastjsonschema==2.16.2 fastremap==1.13.5 filelock==3.12.2 fonttools==4.38.0 fqdn==1.5.1 frozenlist==1.3.3 fscacher==0.3.0 fsspec==2022.11.0 future==0.18.2 gast==0.4.0 gitdb==4.0.9 GitPython==3.1.27 Glymur==0.8.19 google==3.0.0 greenlet==1.1.3 h5py==3.7.0 hdmf==3.9.0 humanize==4.4.0 idna==3.4 imagecodecs==2022.9.26 imageio==2.22.4 imagesize==1.4.1 importlib-metadata==4.13.0 importlib-resources==5.10.0 iniconfig==2.0.0 interleave==0.2.1 ipycanvas==0.13.1 ipydatagrid==1.1.14 ipydatawidgets==4.3.2 ipyevents==2.0.1 ipykernel==6.17.1 ipympl==0.9.2 ipysheet==0.5.0 ipython==8.7.0 ipython-genutils==0.2.0 ipytree==0.2.2 ipyvolume==0.6.0a10 ipyvtklink==0.2.3 ipyvue==1.8.0 ipyvuetify==1.8.4 ipywebrtc==0.6.0 ipywidgets==7.7.2 isodate==0.6.1 isoduration==20.11.0 itk-core==5.3.0 itk-filtering==5.3.0 itk-meshtopolydata==0.10.0 itk-numerics==5.3.0 itkwidgets==0.32.4 jaraco.classes==3.2.3 jedi==0.18.2 Jinja2==3.1.2 JIT==0.0.1 jmespath==0.10.0 joblib==1.2.0 jsonpointer==2.3 jsonschema==3.2.0 jupyter==1.0.0 jupyter-book==0.15.1 jupyter-cache==0.6.1 jupyter-console==6.4.4 jupyter-server==1.23.3 jupyter-server-mathjax==0.2.6 jupyter-sphinx==0.3.2 jupyter_client==7.4.7 jupyter_core==5.1.0 jupyterlab-pygments==0.2.2 jupyterlab-widgets==1.1.1 K3D==2.7.4 keyring==23.11.0 keyrings.alt==4.2.0 kiwisolver==1.4.4 latexcodec==2.0.1 linkify-it-py==2.0.2 literate-dataclasses==0.0.6 llvmlite==0.40.1 locket==1.0.0 loguru==0.6.0 lxml==4.9.1 markdown-it-py==1.1.0 MarkupSafe==2.1.1 marshmallow==3.0.0rc6 matplotlib==3.6.2 matplotlib-inline==0.1.6 matplotlib-venn==0.11.9 mdit-py-plugins==0.3.5 meshio==5.3.4 mistune==0.8.4 more-itertools==9.0.0 morphapi==0.1.7 MorphIO==3.3.3 mpl-interactions==0.22.0 mpmath==1.3.0 msgpack==1.0.4 multidict==6.0.2 munkres==1.1.4 myst-nb==0.17.2 myst-parser==0.18.1 myterial==1.2.1 natsort==8.2.0 nbclassic==0.4.8 nbclient==0.5.13 nbconvert==6.5.4 nbdime==3.1.1 nbformat==5.7.0 nbmake==1.3.5 nd2==0.7.1 ndx-events==0.2.0 ndx-grayscalevolume==0.0.2 ndx-icephys-meta==0.1.0 ndx-spectrum==0.2.2 neo==0.12.0 nest-asyncio==1.5.6 networkx==2.8.8 neurom==3.2.2 notebook==6.5.2 notebook_shim==0.2.2 numba==0.57.1 numcodecs==0.10.2 numexpr==2.8.3 numpy==1.23.5 nwbinspector==0.4.29 nwbwidgets==0.10.0 opencv-python==4.6.0.66 opencv-python-headless==4.8.0.74 ophys-nway-matching @ git+https://github.com/AllenInstitute/ophys_nway_matching@545504ab55922717ab623f8ede2c521a60aa1458 packaging==21.3 pandas==1.5.2 pandocfilters==1.5.0 param==1.12.2 paramiko==3.3.1 parso==0.8.3 partd==1.3.0 patsy==0.5.3 pickleshare==0.7.5 Pillow==9.3.0 -e git+https://github.com/AllenNeuralDynamics/physiology_codeocean_pipelines_paper.git@3bfed5c03bbc0494227ead9fbfab332874926510#egg=pipelinedatabook_utils pkgutil_resolve_name==1.3.10 platformdirs==2.5.4 plotly==5.11.0 pluggy==1.0.0 prometheus-client==0.15.0 prompt-toolkit==3.0.33 psutil==5.9.4 psycopg2-binary==2.9.5 pure-eval==0.2.2 py==1.11.0 py2vega==0.6.1 pybtex==0.24.0 pybtex-docutils==1.0.2 pycparser==2.21 pycryptodomex==3.16.0 pyct==0.4.8 pydantic==1.10.2 pydata-sphinx-theme==0.13.3 Pygments==2.13.0 pyinspect==0.1.0 PyNaCl==1.5.0 pynrrd==0.4.3 pynwb==2.2.0 pyout==0.7.2 pyparsing==3.0.9 PyPDF2==3.0.1 PyQt5==5.15.9 pyqt5-plugins==5.15.9.2.3 PyQt5-Qt5==5.15.2 PyQt5-sip==12.12.2 pyqt5-tools==5.15.9.3.3 pyqtgraph==0.13.3 pyrsistent==0.19.2 pytest==7.2.1 pytest-cov==4.0.0 pytest-xdist==3.2.1 python-dateutil==2.8.2 python-dotenv==1.0.0 pythreejs==2.4.1 pytz==2022.6 PyWavelets==1.4.1 pywin32==306 pywin32-ctypes==0.2.0 pywinpty==2.0.10 PyYAML==6.0 pyzmq==24.0.1 qt5-applications==5.15.2.2.3 qt5-tools==5.15.2.1.3 qtconsole==5.4.0 QtPy==2.3.0 quantities==0.14.1 rastermap==0.1.3 requests==2.28.1 requests-toolbelt==0.10.1 resource-backed-dask-array==0.1.0 retry==0.9.2 rfc3339-validator==0.1.4 rfc3987==1.3.8 rich==12.6.0 roifile==2023.5.12 ruamel.yaml==0.17.21 ruamel.yaml.clib==0.2.7 s3transfer==0.6.1 sbxreader==0.2.2 scanimage-tiff-reader==1.4.1.4 scikit-build==0.16.4 scikit-image==0.19.3 scikit-learn==1.1.2 scipy==1.9.3 seaborn==0.12.1 semantic-version==2.10.0 semver==2.13.0 Send2Trash==1.8.0 SimpleITK==2.2.1 simplejson==3.18.0 six==1.16.0 smmap==5.0.0 sniffio==1.3.0 snowballstemmer==2.2.0 soupsieve==2.3.2.post1 Sphinx==4.5.0 sphinx-argparse==0.4.0 sphinx-book-theme==1.0.1 sphinx-comments==0.0.3 sphinx-copybutton==0.5.0 sphinx-jupyterbook-latex==0.5.2 sphinx-multitoc-numbering==0.1.3 sphinx-thebe==0.2.1 sphinx-togglebutton==0.3.2 sphinx_design==0.3.0 sphinx_external_toc==0.3.1 sphinxcontrib-applehelp==1.0.2 sphinxcontrib-bibtex==2.5.0 sphinxcontrib-devhelp==1.0.2 sphinxcontrib-htmlhelp==2.0.0 sphinxcontrib-jsmath==1.0.1 sphinxcontrib-qthelp==1.0.3 sphinxcontrib-serializinghtml==1.1.5 SQLAlchemy==1.4.41 stack-data==0.6.2 statsmodels==0.14.0 strict-rfc3339==0.7 style==1.1.0 suite2p==0.12.1 sympy==1.12 tables==3.7.0 tabulate==0.9.0 tenacity==8.1.0 tensortools==0.4 terminado==0.17.0 threadpoolctl==3.1.0 tifffile==2022.10.10 tinycss2==1.2.1 tomli==2.0.1 toolz==0.12.0 torch==1.13.1 tornado==6.2 tqdm==4.64.1 traitlets==5.6.0 traittypes==0.2.1 treelib==1.6.1 trimesh==3.16.4 typing_extensions==4.4.0 uc-micro-py==1.0.1 update==0.0.1 uri-template==1.2.0 urllib3==1.26.13 util-colleenjg==0.0.1 vedo==2021.0.5 vtk==9.2.2 wcwidth==0.2.5 webcolors==1.12 webencodings==0.5.1 websocket-client==1.4.2 widgetsnbextension==3.6.1 win32-setctime==1.1.0 wrapt==1.14.1 wslink==1.8.4 xarray==2022.11.0 yarl==1.8.1 zarr==2.13.3 zarr-checksum==0.2.9 zipp==3.11.0 zstandard==0.19.0

Code of Conduct

rly commented 10 months ago

Looking at your code, I see nothing unusual. Would you be able to share that file? You can upload to this google drive folder.

CodyCBakerPhD commented 10 months ago

Missing details from the code: can you show us how the (a) data was wrapped, and (b) how the io was opened?

rcpeene commented 10 months ago

here's the whole function

`def process_suit2p(raw_params):
    """Adds RAW info to an NWB

    Parameters
    ----------
    raw_params: dict
    Contains the nwb's file path and other data

    Returns
    -------
    """
    print("Processing timeseries data")
    with h5py.File(raw_params['suite_2p'], "r") as suite2p:
        data = suite2p['data']
        wrapped_data = H5DataIO(
            data=data,
            compression='gzip',
            compression_opts=4,
            chunks=True,
            maxshape=(None, 100)
        )
        nwb_file = raw_params['nwb_path']
        io = NWBHDF5IO(nwb_file, "r+", load_namespaces=True)
        input_nwb = io.read()
        try:
            ts = TwoPhotonSeries(
                name='raw_suite2p_motion_corrected',
                imaging_plane=(
                    input_nwb.processing['ophys']['image_segmentation']
                    ['cell_specimen_table'].imaging_plane
                ),
                data=wrapped_data,
                format='raw',
                unit='SIunit',
                rate=10.71
            )
        except KeyError:
            channel = OpticalChannel(
                name='place_holder Channel',
                description='place_holder Channel',
                emission_lambda=488.0
            )
            plane = input_nwb.create_imaging_plane(
                name='imaging_plane',
                optical_channel=channel,
                description='Failed Cell Segmentation',
                device=input_nwb.devices['MESO.2'],
                excitation_lambda=488.0,
                imaging_rate=10.71,
                indicator='GCaMP6f',
                location='Failed Cell Segmentation',
            )
            ts = TwoPhotonSeries(
                name='raw_suite2p_motion_corrected',
                imaging_plane=plane,
                data=wrapped_data,
                format='raw',
                unit='SIunit',
                rate=10.71
            )
        input_nwb.add_acquisition(ts)
        io.write(input_nwb)
Collapse`
rcpeene commented 10 months ago

@rly the file is quite large. I've given you access to dandiset 000336 because that might be faster

CodyCBakerPhD commented 10 months ago

Without seeing the resulting NWB file to understand how it got larger without the dataset being added, I would guess it has something to do with how the data is, at that point in the code, a h5py.Dataset object from a separate file, which can affect the way io.write enacts on only the H5DataIO compressor

What I would in general suggest is using the SliceableDataChunkIterator from neuroconv to wrap the suite2p['data'] object, which will buffer the amount loaded into RAM as well as guarantee the slices actually move data from one file to another

rly commented 10 months ago

@rly the file is quite large. I've given you access to dandiset 000336 because that might be faster

@rcpeene I downloaded and opened sub-621602_ses-1193555033-acq-1193675745-denoised-movies_ophys.nwb from dandiset 000336. In there, I see in both pynwb and hdfview these keys in nwbfile.acquisition:

>>> nwbfile.acquisition.keys()
dict_keys(['EyeTracking', 'denoised_suite2p_motion_corrected', 'v_in', 'v_sig'])

Am I looking at the right file? I was expecting to see a raw_suite2p_motion_corrected in HDFView as your code example describes.

I also think @CodyCBakerPhD 's solution of using SliceableDataChunkIterator from neuroconv is worth trying.

rcpeene commented 10 months ago

Maybe its a problem with my environment? what version pywnb, h5py, and hdmf are you using?

rcpeene commented 10 months ago

Context: for my purposes, I use two methods my own imported module dandi_stream_open and dandi_download_open to stream or download an nwb file from dandi, and return the io object. For reasons reported in this issue, nwb = io.read() and then returning the nwb file fails to work from the imported methods, so the io object is returned and then read in the outer scope.

New Info: I discovered when I open the NWB file directly, or when I streamed the file, the 2-Photon movie is available. It is only when using dandi_download_open and returning the io object from a separate file that the movie fails to appear. It seems likely that versioning is also a component in this problem as discussed in the cited issue above.

In light of this new information, are there any solutions in my code to fix this, or will we have to repackage our many 2P movies?

rly commented 10 months ago

Maybe its a problem with my environment? what version pywnb, h5py, and hdmf are you using?

I'm on a mac with python 3.11 using the same pynwb and hdmf versions that you are using. My h5py version is 3.10.0 because 3.7 is not supported on Mac M1.

It is only when using dandi_download_open and returning the io object from a separate file that the movie fails to appear.

I'm a bit confused. Can you share this function? It's possible that when adding raw_suite2p_motion_corrected to the file, a link was created instead of the data being copied, and that link fails in some contexts, but you said that the file size grows significantly, so I think something else is going on here.

rcpeene commented 10 months ago

Yes, not only do the file sizes grow but I can see the movie when viewing the file as a .h5 file with HDFView.

The methods are also in the other issue but I'll paste the relevant portions here for convenience

# streams an NWB file remotely from DANDI, opens it, and returns the IO object for the NWB
# dandi_api_key is required to access files from embargoed dandisets
def dandi_stream_open(dandiset_id, dandi_filepath, dandi_api_key=None):
    client = dandiapi.DandiAPIClient(token=dandi_api_key)
    dandiset = client.get_dandiset(dandiset_id)

    file = dandiset.get_asset_by_path(dandi_filepath)
    base_url = file.client.session.head(file.base_download_url)
    file_url = base_url.headers["Location"]

    fs = CachingFileSystem(
        fs=filesystem("http")
    )

    f = fs.open(file_url, "rb")
    file = h5py.File(f)
    io = NWBHDF5IO(file=file, mode='r', load_namespaces=True)
    return io
def dandi_download_open(dandiset_id, dandi_filepath, download_loc=None, dandi_api_key=None):
    client = dandiapi.DandiAPIClient(token=dandi_api_key)
    dandiset = client.get_dandiset(dandiset_id)

    file = dandiset.get_asset_by_path(dandi_filepath)
    file_url = file.download_url

    filename = dandi_filepath.split("/")[-1]
    filepath = f"{download_loc}/{filename}"

    download.download(file_url, output_dir=download_loc)
    print(f"Downloaded file to {filepath}")

    print("Opening file")
    io = NWBHDF5IO(filepath, mode="r", load_namespaces=True)
    return io
rcpeene commented 10 months ago

When defining the dandi_download_open method from within the same file that io.read() is called rather than importing it, this problem does not occur and the movie is visible.

rly commented 10 months ago

In dandi_download_open, downloading the file and then opening vs opening an existing file should not make a difference, so for debugging, the function can be reduced to:

    io = NWBHDF5IO(filepath, mode="r", load_namespaces=True)
    return io

I have tried to reproduce the error:

In pynwb_1826/pynwb_1826a.py:

from pynwb import NWBHDF5IO

def open_function():
    filepath = "/Users/rly/Downloads/sub-621602_ses-1193555033-acq-1193675745-denoised-movies_ophys.nwb"
    io = NWBHDF5IO(filepath, mode="r", load_namespaces=True)
    return io

In pynwb_1826/pynwb_1826b.py:

from pynwb_1826a import open_function

def read_function(io):
    nwbfile = io.read()
    print(nwbfile.acquisition.keys())

if __name__ == "__main__":
    io = open_function()
    read_function(io)
    io.close()

On the command line:

test ❯ python pynwb_1826/pynwb_1826b.py
/Users/rly/mambaforge/envs/test/lib/python3.11/site-packages/hdmf/spec/namespace.py:531: UserWarning: Ignoring cached namespace 'hdmf-common' version 1.6.0 because version 1.8.0 is already loaded.
  warn("Ignoring cached namespace '%s' version %s because version %s is already loaded."
/Users/rly/mambaforge/envs/test/lib/python3.11/site-packages/hdmf/spec/namespace.py:531: UserWarning: Ignoring cached namespace 'core' version 2.6.0-alpha because version 2.5.0 is already loaded.
  warn("Ignoring cached namespace '%s' version %s because version %s is already loaded."
/Users/rly/mambaforge/envs/test/lib/python3.11/site-packages/hdmf/spec/namespace.py:531: UserWarning: Ignoring cached namespace 'hdmf-experimental' version 0.3.0 because version 0.5.0 is already loaded.
  warn("Ignoring cached namespace '%s' version %s because version %s is already loaded."
dict_keys(['EyeTracking', 'denoised_suite2p_motion_corrected', 'v_in', 'v_sig'])

To be clear, the issue exists on this file, right? Even though the TwoPhotonSeries that I see here is called denoised_suite2p_motion_corrected and not raw_suite2p_motion_corrected ?

Ultimately this might be an issue with IO objects and Python scope that is easier to troubleshoot over video chat. Would you be available for a quick call today between 12 and 2pm PT?

rcpeene commented 10 months ago

My code is running in a Jupyter notebook rather than regular Python. This could affect the issue of the imported method and multiple scopes. I'd be available at 12:30

rly commented 10 months ago

Ah, I'll test this in a Jupyter notebook. I just sent an invite to your alleninstitute email. Thanks.

rcpeene commented 10 months ago

The error was two mistakes confounded on my part; using the incorrect variable path for one file, and viewing a different file that in fact did not contain the movies. Solved!