NeurodataWithoutBorders / matnwb

A Matlab interface for reading and writing NWB files
BSD 2-Clause "Simplified" License
49 stars 32 forks source link

overwrite DynamicTable vectordata #273

Closed nmarkowitz closed 3 years ago

nmarkowitz commented 3 years ago

Hello,

Sorry for the consistent posts I'm making but I've tried this for a while already and I'm stuck. Is there a way to ammend the data contained in an existing VectorData object in a DynamicTable? I've tried a few ways including the following:

% Read file and set variables
nwb_filename = 'sample.nwb';
nwb = nwbRead(nwb_filename);
nwbtable = nwb.general_extracellular_ephys_electrodes; % or any DynamicTable
numrows = 100;
newX = zeros(numrows,1);

% Method1
nwbtable.vectordata.remove('x');
types.util.dynamictable.addRawData(nwbtable,'x',newX);
nwbExport(nwb,nwb_filename);

% Method2
nwbtable.vectordata.set(colname) = types.hdmf_common.VectorData('data',newX, 'description', 'x coordinate');
nwbExport(nwb,nwb_filename);

% Method3
nwbtable.vectordata.get('x').data = newX;
nwbExport(nwb,nwb_filename);

The 'x' column is just an example but I've also tried columns that are not mandatory and contain strings. Or is this not supported and instead I should make a new nwb file if I want to ammend data? Thank you in advance.

Noah

lawrence-mbf commented 3 years ago

Method 3 should be valid but what was the size of the 'x' column initially? Also, what error do you get when you attempt Method 3?

nmarkowitz commented 3 years ago

I apologize you're right. I'm trying to integrate this into a loop and it's getting stuck when the new data is a cell containing. For instance:

new_column_data = repmat({'FALSE'},100,1);
nwbtable.vectordata.get('bad').data = new_column_data;
nwbExport(nwb,nwb_filename)

Warning: Attempted to change size of continuous dataset `/file_create_date`.  Skipping. 
> In io.writeDataset (line 25)
  In types.core.NWBFile/export (line 521)
  In NwbFile/export (line 61)
  In nwbExport (line 35) 
Error using hdf5lib2
The memory type is incompatible with H5T_ENUM.  Consider using 'H5ML_DEFAULT' instead.

Error in H5D.write (line 100)
H5ML.hdf5lib2('H5Dwrite', varargin{:});

Error in io.writeDataset (line 34)
H5D.write(did, tid, sid, sid, 'H5P_DEFAULT', data);

Error in types.untyped.MetaClass/write_base (line 22)
                    io.writeDataset(fid, fullpath, obj.data, 'forceArray');

Error in types.untyped.MetaClass/export (line 65)
            refs = obj.write_base(fid, fullpath, refs);

Error in types.hdmf_common.Data/export (line 39)
        refs = export@types.untyped.MetaClass(obj, fid, fullpath, refs);

Error in types.hdmf_common.VectorData/export (line 42)
        refs = export@types.hdmf_common.Data(obj, fid, fullpath, refs);

Error in types.untyped.Set/export (line 181)
                    refs = v.export(fid, propfp, refs);

Error in types.hdmf_common.DynamicTable/export (line 90)
            refs = obj.vectordata.export(fid, fullpath, refs);

Error in types.core.NWBFile/export (line 565)
            refs = obj.general_extracellular_ephys_electrodes.export(fid, [fullpath '/general/extracellular_ephys/electrodes'], refs);

Error in NwbFile/export (line 61)
                refs = export@types.core.NWBFile(obj, output_file_id, '/', {});

Error in nwbExport (line 35)
    export(nwb(i), filename);

The file create date error occasionally occurs but usually doesn't seem to affect writing.

lawrence-mbf commented 3 years ago

If you're not certain of the final data size, you should write the file with a configured DataPipe with an arbitrary maximum size (chunking the data). That way, you can use the append methods to add data that you may want to allocate in the future. It is important to keep track of this allocated size as HDF5 is very stringent on their size economy.

Incidentally, DataPipe is also one of the (few) classes that actually has documentation in it so calling help on types.untyped.DataPipe should provide a properly formatted doc page.

If the data property on the VectorData is not a DataPipe then the most you can do is rewrite the data that has already been allocated.

lawrence-mbf commented 3 years ago

Correction: if you set the maxSize property of the DataPipe to Inf, then an infinite size is possible (but only for that dimension axis. See docs).

bendichter commented 3 years ago

@sportnoah14 check out this tutorial, which covers iterative writing: https://youtu.be/PIE_F4iVv98

nmarkowitz commented 3 years ago

@ln-vidrio I've tried playing with the DataPipe object but it's still not working. I've also tried rewriting the whole table but that still threw an error. If I try to make a DataPipe object this is what it looks like

new_column_data = repmat({'FALSE'},100,1);
new_column_data = types.untyped.DataPipe('data',new_column_data,'maxSize',Inf,'axis',1);
Error using types.untyped.datapipe.Configuration/set.dataType (line 80)
Datatypes must be one of the following:
single|double|uint8|int8|uint16|int16|uint32|int32|uint64|int64

Error in types.untyped.DataPipe (line 140)
                config.dataType = class(p.Results.data);

The data field for this VectorData is stored as a DataStub object. I don't want to append data, only replace what is currently written (such as change the string in a cell of the table). If the DynamicTable has not yet been allocated then it works fine. Similarly the file doesn't save if I try to remove a field by using something such as:

nwbtable.vectordata.remove('bad')
nwbtable.colnames = nwbtable.vectordata.keys
nwbExport(nwb,nwb_filename)

@bendichter that's a great video. I'll definitely be referencing it for compression.

Also thank you for responding so quickly to all these posts.

bendichter commented 3 years ago

@sportnoah14

new_column_data = types.untyped.DataPipe( ... 
    'data', zeros(100,1), ...
    'maxSize', [1, Inf], ...
    'axis', 1 ...
);
bendichter commented 3 years ago

oh, you don't want to append data.. So why are you creating a DataPipe?

lawrence-mbf commented 3 years ago

Ah, I just missed that you were working with cell arrays of strings. As the error points out, DataPipe currently only works with numeric types. Since you're going to rewrite the data, however, this shouldn't be an issue.

Going back to your original problem, I'm actually unable to reproduce your problem with a similar test using Method 3 which you mentioned above. Looking more closely at your error stack, it seems like the internal HDF5 library is treating one of your strings as an enum for some reason, maybe because the strings are true/false? What version of MatNWB are you using and what version of MATLAB?

bendichter commented 3 years ago

I know what's happening here.

HDF5 does not internally support boolean values. h5py handles this by taking any boolean array and mapping it to an enumerated type with 0 mapping to "FALSE" and 1 mapping to "TRUE". When this data is read in Python, it is read back into memory as a boolean array, but when it is read in MATLAB, it is read as an enumerated type.

lawrence-mbf commented 3 years ago

Okay so cell strings of 'TRUE' and 'FALSE' values should be read in as logical arrays and vice versa? We do not currently save them that way, unfortunately.

lawrence-mbf commented 3 years ago

@sportnoah14 Can you run nwbtest('Name', 'tests.system.DynamicTableTest/testAmend'); on PR #274 ? The test case overwrites TRUE values with FALSE for a given dynamic table.

nmarkowitz commented 3 years ago

Hi @ln-vidrio here is the output from that command after I pulled that branch:

>> nwbtest('Name', 'tests.system.DynamicTableTest/testAmend');
.

results = 

  TestResult with properties:

          Name: 'tests.system.DynamicTableTest/testAmend'
        Passed: 1
        Failed: 0
    Incomplete: 0
      Duration: 11.8985
       Details: [1×1 struct]

Totals:
   1 Passed, 0 Failed, 0 Incomplete.
   11.8985 seconds testing time.

I'm working with NWB version 2.2.5 in both matlab and python. I'm using version 2018b of Matlab. I created the nwb file using pynwb and am now trying to read it into Matlab with matnwb.

So this should make it work with boolean values? What about strings other than 'FALSE' and 'TRUE' such as the string in the ElectrodeTable location column? Is that doable or would I need to write a whole new nwb file for that?

lawrence-mbf commented 3 years ago

It seems that even if you used TRUE and FALSE instead of boolean data it seems like it should work unless h5py saves these specific TRUE FALSE values as static strings.

The special boolean support is something I'll need to address later but if testAmend worked for you then you should be able to rewrite the data there regardless of type (as they'd both just be variable length strings in HDF5 anyway).

nmarkowitz commented 3 years ago

I just did some playing around. The nwb file I initially wrote stored the "bad" column as "8-bit enum (0=FALSE, 1=TRUE)". When read into matlab it turned the values into strings of "TRUE" and "FALSE". I tried writing the file again from the beginning and ammending the column works if it was first written as a float64 value. However I still can't ammend strings. I get a different sort of error:

>> nwbtable.vectordata.get('location').data = repmat({'hello'},205,1);
>> nwbExport(nwb,f)
Warning: Attempted to change size of continuous dataset `/file_create_date`.  Skipping. 
> In io.writeDataset (line 25)
  In types.core.NWBFile/export (line 521)
  In NwbFile/export (line 61)
  In nwbExport (line 35) 
Warning: Attempted to change size of continuous dataset `/general/extracellular_ephys/electrodes/location`.  Skipping. 
> In io.writeDataset (line 25)
  In types.untyped.MetaClass/write_base (line 22)
  In types.untyped.MetaClass/export (line 65)
  In types.hdmf_common.Data/export (line 39)
  In types.hdmf_common.VectorData/export (line 42)
  In types.untyped.Set/export (line 181)
  In types.hdmf_common.DynamicTable/export (line 90)
  In types.core.NWBFile/export (line 565)
  In NwbFile/export (line 61)
  In nwbExport (line 35) 
Error using hdf5lib2
The HDF5 library encountered an unknown error.

Error in H5D.write (line 100)
H5ML.hdf5lib2('H5Dwrite', varargin{:});

Error in io.writeDataset (line 34)
H5D.write(did, tid, sid, sid, 'H5P_DEFAULT', data);

Error in types.untyped.MetaClass/write_base (line 22)
                    io.writeDataset(fid, fullpath, obj.data, 'forceArray');

Error in types.untyped.MetaClass/export (line 65)
            refs = obj.write_base(fid, fullpath, refs);

Error in types.hdmf_common.Data/export (line 39)
        refs = export@types.untyped.MetaClass(obj, fid, fullpath, refs);

Error in types.hdmf_common.VectorData/export (line 42)
        refs = export@types.hdmf_common.Data(obj, fid, fullpath, refs);

Error in types.untyped.Set/export (line 181)
                    refs = v.export(fid, propfp, refs);

Error in types.hdmf_common.DynamicTable/export (line 90)
            refs = obj.vectordata.export(fid, fullpath, refs);

Error in types.core.NWBFile/export (line 565)
            refs = obj.general_extracellular_ephys_electrodes.export(fid, [fullpath '/general/extracellular_ephys/electrodes'], refs);

Error in NwbFile/export (line 61)
                refs = export@types.core.NWBFile(obj, output_file_id, '/', {});

Error in nwbExport (line 35)
    export(nwb(i), filename);

Perhaps it doesn't like the fact that string size is changing?

bendichter commented 3 years ago

Can you try nwbtable.vectordata.get('location').data = uint8(ones(1, 205))

nmarkowitz commented 3 years ago

@bendichter I tried assigning uint8(ones(1, 205)) to both the location (contains strings) and bad (contains booleans) columns. Both threw an error when I tried to export after assigning it.

nmarkowitz commented 3 years ago

I think I just figured it out after your suggestions. It doesn't like that the strings change size. I ran this:

c = nwbtable.vectordata.get('location').data.load_h5_style;
new_data = cell(length(c),1);
for ii = 1:length(c)
    this_str_len = length(c{ii});
    new_data{ii} = repmat('s',1,this_str_len);
end
nwbtable.vectordata.get('location').data = new_data;

% Save the file
nwbExport(nwb,f)

I just replaced each character in a string with an "s". The length of each string remained the same. As a result it saved fine. This is good to know for when I create these files and the extent I can edit them. Do you think there's any way to allow string size to change or is that something that HDF5 doesn't allow?

lawrence-mbf commented 3 years ago

By default, MatNWB stores everything as a variable length string so it shouldn't matter what length you're rewriting it as (you can test this by changing the strings set in tests.system.DynamicTableTest. If these strings are stored with a static length, then yes the same constraints apply to their size.

nmarkowitz commented 3 years ago

For variable length strings (which was my case) I was able to write new values if I passed a whole new set of data into the VectorData object.

fname = 'myfile.nwb';
nwb = nwbRead(fname);
nrows = nwb.general_extracellular_ephys_electrodes.vectordata.get('location').data.dims;
nwb.general_extracellular_ephys_electrodes.vectordata.get('location').data = repmat({'somewhere'},nrows,1);
nwbExport(nwb,fname)

Regarding boolean uint8 values I think I'm going to avoid those for now. I'll close this issue