NeurodataWithoutBorders / matnwb

A Matlab interface for reading and writing NWB files
BSD 2-Clause "Simplified" License
50 stars 32 forks source link

Unexpected additional entry in vectordata #513

Closed GoktugAlkan closed 1 year ago

GoktugAlkan commented 1 year ago

Hello,

We created nwb files where the field nwb.units is populated according to our previous discussion. We also added a cell array that contains the labels of all the spikes that were detected. This cell array can be accessed by executing nwb.units.vectordata.get('labels').

When we execute nwb.units.vectordata.get('labels') we see the following:

DataStub with properties:

  filename: '/home/matlab/testNWB.nwb'
        path: '/units/labels'
        dims: 10327084
       ndims: 1
    dataType: 'char'

This means that we should have 10327084 entries in this array, which is correct so far. However, when we execute nwb.units.vectordata.get('labels').data(10327084+1) we get the following answer:

ans =

  1×1 cell array

    {0×0 char}

We are wondering why there is an empty entry at index 10327084+1 although the whole cell array should just contain 10327084 entries. We would have expected to see an error when we execute this command.

Because of this issue we are not sure trust the answers when we load individual entries of this array by executing, for example, nwb.units.vectordata.get('labels').data([10275652,10327080]).

Remark: Although nwb.units.vectordata.get('labels').data(10327084+1) is an empty entry executing nwb.units.vectordata.get('labels').data([10275652,10327085] outputs

2×1 cell array

    {'label_xx'}
    {'label_xx'}

Many thanks in advance for your help.

lawrence-mbf commented 1 year ago

Hi @GoktugAlkan ,

Is the data returned by nwb.units.vectordata.get('labels').data([10275652,10327085]) correct? It looks like element 10327085 should also throw an error no?

GoktugAlkan commented 1 year ago

@lawrence-mbf Exactly, normally there should be an error for element 10327085 because we don't have any entry at that position. {'label_xx'} corresponds to entry at position 10275652 but an error should be there for position 10327085.

Interestingly, also the preview of says that there are 10327085 entries in the cell array.

lawrence-mbf commented 1 year ago

@GoktugAlkan With an example script, I can recreate the bug where the data is duplicated if the non-contiguous indices are invalid. I do get an error if I index outside the allocated space though.

GoktugAlkan commented 1 year ago

@lawrence-mbf Did you try to save the nwb file (nwbExport), then read it (nwbRead), and then access entry 10327085?

lawrence-mbf commented 1 year ago

If you run the following script and call DataStub([1, 11]) the output will duplicate the data from index 1.

Here is the full script ```MATLAB filename = 'test.h5'; fullDatasetPath = '/data'; datasetName = 'data'; dataSize = 10; if 2 ~= exist(filename, 'file') h5create(filename, fullDatasetPath, dataSize); end h5write(filename, fullDatasetPath, (1:10)); h5disp(filename); DataStub = types.untyped.DataStub(filename, fullDatasetPath); ```
lawrence-mbf commented 1 year ago

Note that just indexing the out-of-bounds DataStub(11) throws an error. Not sure why your code does not.

lawrence-mbf commented 1 year ago

@GoktugAlkan Out of curiosity, if you use double values for the "labels" property does the same behavior occur? Or, if you have DataStubs of other types, does indexing out of bounds do the same thing?

lawrence-mbf commented 1 year ago

Hi @GoktugAlkan ,

Please check and let me know if this PR fixes things: https://github.com/NeurodataWithoutBorders/matnwb/pull/514