Closed rcpeene closed 10 months ago
Looking at your code, I see nothing unusual. Would you be able to share that file? You can upload to this google drive folder.
Missing details from the code: can you show us how the (a) data was wrapped, and (b) how the io
was opened?
here's the whole function
`def process_suit2p(raw_params):
"""Adds RAW info to an NWB
Parameters
----------
raw_params: dict
Contains the nwb's file path and other data
Returns
-------
"""
print("Processing timeseries data")
with h5py.File(raw_params['suite_2p'], "r") as suite2p:
data = suite2p['data']
wrapped_data = H5DataIO(
data=data,
compression='gzip',
compression_opts=4,
chunks=True,
maxshape=(None, 100)
)
nwb_file = raw_params['nwb_path']
io = NWBHDF5IO(nwb_file, "r+", load_namespaces=True)
input_nwb = io.read()
try:
ts = TwoPhotonSeries(
name='raw_suite2p_motion_corrected',
imaging_plane=(
input_nwb.processing['ophys']['image_segmentation']
['cell_specimen_table'].imaging_plane
),
data=wrapped_data,
format='raw',
unit='SIunit',
rate=10.71
)
except KeyError:
channel = OpticalChannel(
name='place_holder Channel',
description='place_holder Channel',
emission_lambda=488.0
)
plane = input_nwb.create_imaging_plane(
name='imaging_plane',
optical_channel=channel,
description='Failed Cell Segmentation',
device=input_nwb.devices['MESO.2'],
excitation_lambda=488.0,
imaging_rate=10.71,
indicator='GCaMP6f',
location='Failed Cell Segmentation',
)
ts = TwoPhotonSeries(
name='raw_suite2p_motion_corrected',
imaging_plane=plane,
data=wrapped_data,
format='raw',
unit='SIunit',
rate=10.71
)
input_nwb.add_acquisition(ts)
io.write(input_nwb)
Collapse`
@rly the file is quite large. I've given you access to dandiset 000336 because that might be faster
Without seeing the resulting NWB file to understand how it got larger without the dataset being added, I would guess it has something to do with how the data
is, at that point in the code, a h5py.Dataset
object from a separate file, which can affect the way io.write
enacts on only the H5DataIO
compressor
What I would in general suggest is using the SliceableDataChunkIterator
from neuroconv to wrap the suite2p['data']
object, which will buffer the amount loaded into RAM as well as guarantee the slices actually move data from one file to another
@rly the file is quite large. I've given you access to dandiset 000336 because that might be faster
@rcpeene I downloaded and opened sub-621602_ses-1193555033-acq-1193675745-denoised-movies_ophys.nwb
from dandiset 000336. In there, I see in both pynwb and hdfview these keys in nwbfile.acquisition
:
>>> nwbfile.acquisition.keys()
dict_keys(['EyeTracking', 'denoised_suite2p_motion_corrected', 'v_in', 'v_sig'])
Am I looking at the right file? I was expecting to see a raw_suite2p_motion_corrected
in HDFView as your code example describes.
I also think @CodyCBakerPhD 's solution of using SliceableDataChunkIterator from neuroconv is worth trying.
Maybe its a problem with my environment? what version pywnb, h5py, and hdmf are you using?
Context:
for my purposes, I use two methods my own imported module dandi_stream_open
and dandi_download_open
to stream or download an nwb file from dandi, and return the io
object. For reasons reported in this issue, nwb = io.read()
and then returning the nwb file fails to work from the imported methods, so the io
object is returned and then read in the outer scope.
New Info:
I discovered when I open the NWB file directly, or when I streamed the file, the 2-Photon movie is available. It is only when using dandi_download_open
and returning the io
object from a separate file that the movie fails to appear. It seems likely that versioning is also a component in this problem as discussed in the cited issue above.
In light of this new information, are there any solutions in my code to fix this, or will we have to repackage our many 2P movies?
Maybe its a problem with my environment? what version pywnb, h5py, and hdmf are you using?
I'm on a mac with python 3.11 using the same pynwb and hdmf versions that you are using. My h5py version is 3.10.0 because 3.7 is not supported on Mac M1.
It is only when using
dandi_download_open
and returning theio
object from a separate file that the movie fails to appear.
I'm a bit confused. Can you share this function? It's possible that when adding raw_suite2p_motion_corrected
to the file, a link was created instead of the data being copied, and that link fails in some contexts, but you said that the file size grows significantly, so I think something else is going on here.
Yes, not only do the file sizes grow but I can see the movie when viewing the file as a .h5 file with HDFView.
The methods are also in the other issue but I'll paste the relevant portions here for convenience
# streams an NWB file remotely from DANDI, opens it, and returns the IO object for the NWB
# dandi_api_key is required to access files from embargoed dandisets
def dandi_stream_open(dandiset_id, dandi_filepath, dandi_api_key=None):
client = dandiapi.DandiAPIClient(token=dandi_api_key)
dandiset = client.get_dandiset(dandiset_id)
file = dandiset.get_asset_by_path(dandi_filepath)
base_url = file.client.session.head(file.base_download_url)
file_url = base_url.headers["Location"]
fs = CachingFileSystem(
fs=filesystem("http")
)
f = fs.open(file_url, "rb")
file = h5py.File(f)
io = NWBHDF5IO(file=file, mode='r', load_namespaces=True)
return io
def dandi_download_open(dandiset_id, dandi_filepath, download_loc=None, dandi_api_key=None):
client = dandiapi.DandiAPIClient(token=dandi_api_key)
dandiset = client.get_dandiset(dandiset_id)
file = dandiset.get_asset_by_path(dandi_filepath)
file_url = file.download_url
filename = dandi_filepath.split("/")[-1]
filepath = f"{download_loc}/{filename}"
download.download(file_url, output_dir=download_loc)
print(f"Downloaded file to {filepath}")
print("Opening file")
io = NWBHDF5IO(filepath, mode="r", load_namespaces=True)
return io
When defining the dandi_download_open
method from within the same file that io.read() is called rather than importing it, this problem does not occur and the movie is visible.
In dandi_download_open
, downloading the file and then opening vs opening an existing file should not make a difference, so for debugging, the function can be reduced to:
io = NWBHDF5IO(filepath, mode="r", load_namespaces=True)
return io
I have tried to reproduce the error:
In pynwb_1826/pynwb_1826a.py
:
from pynwb import NWBHDF5IO
def open_function():
filepath = "/Users/rly/Downloads/sub-621602_ses-1193555033-acq-1193675745-denoised-movies_ophys.nwb"
io = NWBHDF5IO(filepath, mode="r", load_namespaces=True)
return io
In pynwb_1826/pynwb_1826b.py
:
from pynwb_1826a import open_function
def read_function(io):
nwbfile = io.read()
print(nwbfile.acquisition.keys())
if __name__ == "__main__":
io = open_function()
read_function(io)
io.close()
On the command line:
test ❯ python pynwb_1826/pynwb_1826b.py
/Users/rly/mambaforge/envs/test/lib/python3.11/site-packages/hdmf/spec/namespace.py:531: UserWarning: Ignoring cached namespace 'hdmf-common' version 1.6.0 because version 1.8.0 is already loaded.
warn("Ignoring cached namespace '%s' version %s because version %s is already loaded."
/Users/rly/mambaforge/envs/test/lib/python3.11/site-packages/hdmf/spec/namespace.py:531: UserWarning: Ignoring cached namespace 'core' version 2.6.0-alpha because version 2.5.0 is already loaded.
warn("Ignoring cached namespace '%s' version %s because version %s is already loaded."
/Users/rly/mambaforge/envs/test/lib/python3.11/site-packages/hdmf/spec/namespace.py:531: UserWarning: Ignoring cached namespace 'hdmf-experimental' version 0.3.0 because version 0.5.0 is already loaded.
warn("Ignoring cached namespace '%s' version %s because version %s is already loaded."
dict_keys(['EyeTracking', 'denoised_suite2p_motion_corrected', 'v_in', 'v_sig'])
To be clear, the issue exists on this file, right? Even though the TwoPhotonSeries that I see here is called denoised_suite2p_motion_corrected
and not raw_suite2p_motion_corrected
?
Ultimately this might be an issue with IO objects and Python scope that is easier to troubleshoot over video chat. Would you be available for a quick call today between 12 and 2pm PT?
My code is running in a Jupyter notebook rather than regular Python. This could affect the issue of the imported method and multiple scopes. I'd be available at 12:30
Ah, I'll test this in a Jupyter notebook. I just sent an invite to your alleninstitute email. Thanks.
The error was two mistakes confounded on my part; using the incorrect variable path for one file, and viewing a different file that in fact did not contain the movies. Solved!
What happened?
We have packaged our NWB files and added 2-photon movies to the files. This is evidence by the fact that the file grows in size significantly, and that the movie can be seen when exploring the file in .h5 format. However, when opening with PyNWB, the object does not appear in nwb.acquisition.
Steps to Reproduce
Traceback
Operating System
Windows
Python Executable
Python
Python Version
3.9
Package Versions
Code of Conduct