Closed GoktugAlkan closed 1 year ago
Is there an update concerning this issue?
Sorry for not getting back to you last week.
By default MatNWB should support rewriting attributes. For datasets, the rewrite must be the same shape as before.
By the way, this is done by exporting an open NWB file to its original file location so you don't have to create the Nwb object from scratch.
@lawrence-mbf Thanks for the response. The problem is that the dataset that we want to insert may be significantly different than the one that was already existing. For example, we may have 1000 spikes less than before. Hence, the shape of the dataset would not be preserved.
Therefore, instead of overwriting this field, my idea was to delete all data inside nwb.units
first and then populate this field with our revisited spike information. Finally, I wanted to store this file on the same location.
What would be the best way to realize this?
Thanks in advance!
@GoktugAlkan For attributes you can always use H5A.delete for data you don't need.
For datasets there's currently no way that I've found to rewrite AND resize the data without using chunking. I wonder what pynwb is actually doing under the hood for the pop
method because even low-level MATLAB calls are unable to delete dataset containers as far as I can tell.
The only other way I've found is by "unlinking" the data and repacking: https://www.mathworks.com/matlabcentral/answers/395920-how-can-i-delete-a-dataset-completely-from-a-group-in-a-hdf5-file
No clue how performant this actually is.
@lawrence-mbf Thanks a lot. With the functions in the provided link I am able to delete the field nwb.units
. I will inform you soon with my final conclusion.
@GoktugAlkan keep in mind there is an oddity of HDF5 that deleted objects still take up space in the file. In the case of a units table, this may not be a major problem, but it can be quite wasteful in some circumstances. To solve this, you should use the h5repack command line utility: https://manpages.ubuntu.com/manpages/lunar/man1/h5repack.1.html
@bendichter Thanks! Concerning this point, I guess that when the data that I want to insert into the deleted field is bigger than the previous data (i.e. bigger than the space that the deleted object take up in the file) there should be no problem. Is this correct?
@GoktugAlkan There is no such guarantee unless you use chunking and/or h5repack. Unlinking just removes references to the data but still keeps the allocated space around.
@lawrence-mbf @bendichter Since I am preparing the pipelines for lab members who use MATLAB, it would be very convenient to write a MATLAB function to apply this repacking. However, it seems that this is not possible in this case. I will try to apply the repacking and inform you about the progress.
If possible, it would be nice to have a method like pop as in pyNWB that applies the resizing/repacking of the file.
PyNWB has the same issue. If you want to remove a dataset and free the space, you either need to write everything to a new file or use h5repack.
@bendichter @lawrence-mbf The repacking works. I tested this on an nwb file, where I deleted the acquisition field. After removing that field, the new nwb file still occupied the same storage. But after using h5repack, the storage of the nwb file decreased significantly.
@bendichter @lawrence-mbf I did a final test where I added large datasets to nwb files that had been repacked before. This test also works. In addition, the repacked nwb files/repacked & re-populated nwb files can be read in pyNWB.
As I said before, it would be nice if you provided a function in matNWB that can handle the deletion of an nwb field and the repacking afterwards.
If you want we can close this issue. Thanks a lot!
Hello,
Currently, I am trying to remove data from an already existing nwb file. There is an nwb file that is stored on the disk and that I load with
nwbRead
. I want to remove the data in the fieldnwb.units
. However, I couldn't find a method like the pop-method that exists inpyNWB
(explained here). Is there a way to resolve this issue?Background of issue: We created nwb files containing raw data and information on spike times/spike clusters/waveforms that are stored in
nwb.units
as proposed in your tutorials/explanations. After creating these files, we tuned again our spike sorting algorithm to get cleaner units. Hence, we need to change the information stored innwb.units
. That's why I am trying to delete the data in this field and to populate the field with the latest information about the spikes. We want to avoid the creation of a file from scratch.Many thanks in advance!