Open VBaratham opened 6 years ago
@VBaratham
You can write an external link to another object by using the manager
argument of NWBHDF5IO
. This writes an external link to a TimeSeries
object in acquisition:
from datetime import datetime
from pynwb import NWBFile, NWBHDF5IO, TimeSeries, get_manager
manager = get_manager()
nwb = NWBFile(session_description='aa', identifier='aa', session_start_time=datetime.now())
ts1 = TimeSeries(name='timeseries1', data=[4, 5, 6],
unit='na1', rate=1.0, starting_time=0.0)
nwb.add_acquisition(ts1)
with NWBHDF5IO('test_append.nwb', mode='w', manager=manager) as io:
io.write(nwb)
with NWBHDF5IO('test_append.nwb', 'r', manager=manager) as f:
nwb_r = f.read()
nwb = NWBFile(session_description='aa', identifier='aa', session_start_time=datetime.now())
nwb.add_acquisition(nwb_r.acquisition['timeseries1'])
with NWBHDF5IO('test_append2.nwb', mode='w', manager=manager) as io:
io.write(nwb)
What you describe- linking an entire NWBFile
object- would at its simplest look like this:
from datetime import datetime
from pynwb import NWBFile, NWBHDF5IO, get_manager
manager = get_manager()
nwb = NWBFile(session_description='aa', identifier='aa', session_start_time=datetime.now())
with NWBHDF5IO('file1.nwb', mode='w', manager=manager) as io:
io.write(nwb)
with NWBHDF5IO('file1.nwb', 'r', manager=manager) as f:
nwb = f.read()
with NWBHDF5IO('file2.nwb', mode='w', manager=manager) as io:
io.write(nwb)
This would instruct pynwb to write the entire file as an external link to another file. You are correct that this leads to an error, but is this really what you are trying to do?
Hi @bendichter
I think I'm a bit lost. Here's what I want to do:
I have a file called test.nwb. I want to read that in, add some data or something, and then save it out to a file called test2.nwb so that I have two nwb files side by side, the first of which (test.nwb) is missing this added dataset. I don't want the second file to be a link to the first, I want it to have all the data replicated. Is this possible? Or do I have to use some other tool to duplicate the file before adding to one of them (would that work)?
@VBaratham by default PyNWB will in this case create external links in test2.nwb
to the containers that have previously been written to test.nwb
. If you want all the data from test.nwb
to be copied to test2.nwb
then simply set link_data=False
on write which will change the default behavior to copy the data rather than link it. I.e,
with NWBHDF5IO('file2.nwb', mode='w', manager=manager) as io:
io.write(nwb, link_data=False)
Thanks for the clarification, Oliver. We should add that to the tutorials
@bendichter what do you think the best place for this is. Maybe in the modular storage tutorial:
Yeah or maybe in advanced IO
~/devel/pynwb/docs/gallery/domain (master) $ git diff .
diff --git a/docs/gallery/domain/icephys.py b/docs/gallery/domain/icephys.py
index 965270ed..82a07c3b 100644
--- a/docs/gallery/domain/icephys.py
+++ b/docs/gallery/domain/icephys.py
@@ -150,3 +150,11 @@ elec = ccss.electrode
# And the device name via :py:meth:`~pynbwb.file.NWBFile.get_device`
device = nwbfile.get_device('Heka ITC-1600')
+
+io.close()
+
+io = NWBHDF5IO('icephys_example.nwb', 'w')
+nwbfile = io.read()
+nwbfile.add_acquisition(vcs)
+io.write(nwbfile)
+io.close()
~/devel/pynwb/docs/gallery/domain (master) $ python icephys.py
Traceback (most recent call last):
File "icephys.py", line 157, in <module>
nwbfile = io.read()
File "/home/firma/.local/lib/python2.7/site-packages/hdmf/utils.py", line 381, in func_call
return func(self, **parsed['args'])
File "/home/firma/.local/lib/python2.7/site-packages/hdmf/backends/io.py", line 33, in read
container = self.__manager.construct(f_builder)
File "/home/firma/.local/lib/python2.7/site-packages/hdmf/utils.py", line 381, in func_call
return func(self, **parsed['args'])
File "/home/firma/.local/lib/python2.7/site-packages/hdmf/build/map.py", line 195, in construct
result = self.__type_map.construct(builder, self)
File "/home/firma/.local/lib/python2.7/site-packages/hdmf/utils.py", line 381, in func_call
return func(self, **parsed['args'])
File "/home/firma/.local/lib/python2.7/site-packages/hdmf/build/map.py", line 1561, in construct
attr_map = self.get_map(builder)
File "/home/firma/.local/lib/python2.7/site-packages/hdmf/utils.py", line 381, in func_call
return func(self, **parsed['args'])
File "/home/firma/.local/lib/python2.7/site-packages/hdmf/build/map.py", line 1492, in get_map
container_cls = self.get_cls(obj)
File "/home/firma/.local/lib/python2.7/site-packages/hdmf/utils.py", line 381, in func_call
return func(self, **parsed['args'])
File "/home/firma/.local/lib/python2.7/site-packages/hdmf/build/map.py", line 1424, in get_cls
raise ValueError("No data_type found for builder %s" % builder.path)
ValueError: No data_type found for builder root
~/devel/pynwb/docs/gallery/domain (master) $ git rev-parse HEAD
8f472d6c5b74a72e57073fca4c8100a60dfc16a1
~/devel/pynwb/docs/gallery/domain (master) $
This has nothing to do with external links.
@t-b in your example, you are opening the file in write mode but then reading from it, which is not allowed. We should check if that happens and throw a user-friendly error if so.
If you change 'w'
to 'r+'
, then you get the correct error:
Traceback (most recent call last):
File "icephys.py", line 159, in <module>
nwbfile.add_acquisition(vcs)
File "c:\users\ryan\documents\nwb\hdmf\src\hdmf\utils.py", line 388, in func_call
return func(self, **parsed['args'])
File "c:\users\ryan\documents\nwb\pynwb\src\pynwb\file.py", line 617, in add_acquisition
self._add_acquisition_internal(nwbdata)
File "c:\users\ryan\documents\nwb\hdmf\src\hdmf\utils.py", line 388, in func_call
return func(self, **parsed['args'])
File "c:\users\ryan\documents\nwb\pynwb\src\pynwb\core.py", line 743, in _func
raise ValueError(msg)
ValueError: 'vcs' already exists in 'root'
@rly Thanks, granted this was sloopy.
Next try:
~/devel/pynwb (better-validation) $ git diff .
diff --git a/docs/gallery/domain/icephys.py b/docs/gallery/domain/icephys.py
index bfff2ed6..448f529a 100644
--- a/docs/gallery/domain/icephys.py
+++ b/docs/gallery/domain/icephys.py
@@ -150,3 +150,22 @@ elec = ccss.electrode
# And the device name via :py:meth:`~pynbwb.file.NWBFile.get_device`
device = nwbfile.get_device('Heka ITC-1600')
+
+io.close()
+
+ioread = NWBHDF5IO('icephys_example.nwb', 'r')
+
+vcs2 = VoltageClampSeries(
+ name='vcs2', data=[0.1, 0.2, 0.3, 0.4, 0.5],
+ unit='A', conversion=1e-12, resolution=np.nan, starting_time=123.6, rate=20e3,
+ electrode=elec, gain=0.02, capacitance_slow=100e-12, resistance_comp_correction=70.0,
+ capacitance_fast=np.nan, resistance_comp_bandwidth=np.nan, resistance_comp_prediction=np.nan,
+ whole_cell_capacitance_comp=np.nan, whole_cell_series_resistance_comp=np.nan)
+
+nwbfile = ioread.read()
+nwbfile.add_acquisition(vcs2)
+
+io = NWBHDF5IO('icephys_example2.nwb', 'w')
+io.write(nwbfile)
+io.close()
+ioread.close()
~/devel/pynwb (better-validation) $ python docs/gallery/domain/icephys.py
Traceback (most recent call last):
File "docs/gallery/domain/icephys.py", line 169, in <module>
io.write(nwbfile)
File "/home/firma/devel/hdmf/src/hdmf/utils.py", line 388, in func_call
return func(self, **parsed['args'])
File "/home/firma/devel/hdmf/src/hdmf/backends/hdf5/h5tools.py", line 196, in write
call_docval_func(super(HDF5IO, self).write, kwargs)
File "/home/firma/devel/hdmf/src/hdmf/utils.py", line 281, in call_docval_func
return func(*fargs, **fkwargs)
File "/home/firma/devel/hdmf/src/hdmf/utils.py", line 388, in func_call
return func(self, **parsed['args'])
File "/home/firma/devel/hdmf/src/hdmf/backends/io.py", line 39, in write
f_builder = self.__manager.build(container, source=self.__source)
File "/home/firma/devel/hdmf/src/hdmf/utils.py", line 388, in func_call
return func(self, **parsed['args'])
File "/home/firma/devel/hdmf/src/hdmf/build/map.py", line 158, in build
raise ValueError("Can't change container_source once set")
ValueError: Can't change container_source once set
~/devel/pynwb (better-validation) $
@oruebel @bendichter
The problem seems to persist, even with link_data=False
. Here's a reproducible error:
from datetime import datetime
from dateutil.tz import tzlocal
from pynwb import NWBFile, NWBHDF5IO, get_manager, TimeSeries
import nwbext_ecog
import numpy as np
import os
manager = get_manager()
# Creates file 1
nwb_1 = NWBFile(session_description='session', identifier='1', session_start_time=datetime.now(tzlocal()))
# Add signal
X_data = np.zeros((10,1000))
timestamps = list(range(1000))
test_ts = TimeSeries(name='test_timeseries', data=X_data, unit='s', timestamps=timestamps)
nwb_1.add_acquisition(test_ts)
with NWBHDF5IO('file_1.nwb', mode='w') as io:
io.write(nwb_1)
# Open file 1
with NWBHDF5IO('file_1.nwb', 'r', manager=manager) as io_1:
nwb_1 = io_1.read()
# Open file 2
with NWBHDF5IO('file_2.nwb', mode='w', manager=manager) as io_2:
nwb_2 = NWBFile(session_description='session', identifier='2', session_start_time=datetime.now(tzlocal()))
for aux in nwb_1.acquisition: # Acquisition
nwb_2.add_acquisition(nwb_1.get_acquisition(aux))
io_2.write(nwb_2, link_data=False)
print(nwb_2.acquisition['test_timeseries'])
file_2.nwb
was created and apparently has something in the acquisition field. However, the file sizes are different and, when I delete file_1.nwb
, reading from file_2.nwb
raises and error.
# Open file 2 after deleting file 1
with NWBHDF5IO('file_2.nwb', 'r', manager=manager) as io:
nwb_2 = io.read()
print(nwb_2.acquisition['test_timeseries'])
AttributeError: 'NoneType' object has no attribute 'name'
re: @oruebel 's comment earlier about adding link_data=False
. That flag applies only to datasets. We do not support copying selected containers just yet, so yes, @luiztauffer 's example fails because the 'test_timeseries' TimeSeries is a link which is broken after file 1 is deleted.
https://github.com/hdmf-dev/hdmf/pull/108 will be able to help with that, but this will take a little time to tackle.
FYI: a full copy of a file is supported by HDF5IO.copy_file
.
EDIT @rly
I am running into the same set of problems (container source, default linking) while attempting to carry out the same read -> modify -> write separately workflow. As a workaround, I can:
This is pretty gross. Is there a better way? What is the actual purpose of container_source if it is being reset on each write?
If there is not a better way currently, when do you think these issues could be addressed? We are using this functionality (via workaround) to process data we hope to release in June and it would be great if we could have clean code in place to process those data.
@NileGraddis would this https://pynwb.readthedocs.io/en/stable/tutorials/general/scratch.html#copying-an-nwb-file workflow work for you?
@oruebel
Thanks for the quick reply! I think that is almost there, except that I would like the copy to be deep rather than shallow - we're using this in a processing pipeline which incrementally adds to an nwb file which ought then to stand alone.
@NileGraddis if you want a full physical copy of the file, then you could just copy the file first and then open the copy. HDF5IO.copy_file can do this (which is useful if you need to resolve external links) or you could just do standard python
from shutil import copyfile
copyfile(src, dst)
to copy the file.
@oruebel
Copying the file is certainly another workaround, though it requires that the intended output is an h5 file on disk, which might not be true in the future!
I suppose that the main concern I have about my workaround is container_source
- I'm not thrilled to rely on updating a private attribute, which might be changed in the future! I am also a bit confused about the purpose of container_source. It seems like it might represent:
NWBHDF5IO
try (and fail) to reset it on write?parent
and doesn't explain why it is read-only.What is the intent behind container_source
? Would it make sense to make a simple modification to the NWBDHF5IO
that would resolve this issue, like only trying to set container_source
if it is not already set?
Hi, I have the same use case: I would like to copy a few data interfaces below I get in an NWB file from Suite2p to another main NWB file.
ipdb> nwb2p.processing
{'ophys': ophys pynwb.base.ProcessingModule at 0x140110771702800
Fields:
data_interfaces: {
Backgrounds_0 <class 'pynwb.base.Images'>,
Backgrounds_1 <class 'pynwb.base.Images'>,
Backgrounds_2 <class 'pynwb.base.Images'>,
Deconvolved <class 'pynwb.ophys.Fluorescence'>,
Fluorescence <class 'pynwb.ophys.Fluorescence'>,
ImageSegmentation <class 'pynwb.ophys.ImageSegmentation'>,
Neuropil <class 'pynwb.ophys.Fluorescence'>
}
description: optical physiology processed data
}
As long as the NWB object is in memory, everything is fine, but when I try to write it on disk, I get the same error with the container_source
referencing the old NWB file.
AIso the problem with copying as suggested here is that I only want some parts of the file.
For now I ended up modifying the _AbstractContainer__container_source
attribute of each data interface, which is probably not the way to go.
https://github.com/hdmf-dev/hdmf/pull/326 should address this issue
It seems that this has been fixed, has it?
I can find an export method in the HDMF docs but I can't find any NWBHDF5IO.export()
as written here.
If you use the latest PyNWB with the latest HDMF, then you can call the export method on NWBHDF5IO
. Technically conda and pypi will say that the latest PyNWB is not compatible with the latest HDMF. This will be fixed in the next release, so I'll close this issue then.
Removal of objects from an in-memory NWBFile before exporting is not supported in PyNWB yet, but will be supported in an upcoming release.
1) Bug
Loading an NWBFile from one file, then saving it to another causes
ValueError: Can't change container_source once set
to be thrownSteps to Reproduce
Environment
Please describe your environment according to the following bullet points.
Checklist