NeurodataWithoutBorders / matnwb

A Matlab interface for reading and writing NWB files
BSD 2-Clause "Simplified" License
50 stars 32 forks source link

problem loading large data matrix #39

Closed bendichter closed 6 years ago

bendichter commented 6 years ago

I am trying to load a file with a large LFP data block. The shape is saved as 50461375x80 and type int16. When I try to run nwbRead I run into several problems. The error I receive is

Error using double
Requested 4036910000x1 (30.1GB) array exceeds maximum array
size preference. Creation of arrays greater than this limit
may take a long time and cause MATLAB to become unresponsive.
See array size limit or preference panel for more
information.

Error in types.util.checkDtype (line 40)
        val = eval([type '(val)']);

Error in types.core.ElectricalSeries/validate_data (line 33)
        val = types.util.checkDtype('data', 'double', val);

Error in types.core.TimeSeries/set.data (line 96)
        obj.data = obj.validate_data(val);

Error in types.core.TimeSeries (line 72)
        obj.data = p.Results.data;

Error in types.core.ElectricalSeries (line 17)
        obj = obj@types.core.TimeSeries(varargin{:});

Error in io.parseGroup (line 68)
    parsed = eval([typename '(kwargs{:})']);

Error in io.parseGroup (line 26)
    subg = io.parseGroup(filename, g_info);

Error in io.parseGroup (line 26)
    subg = io.parseGroup(filename, g_info);

Error in io.parseGroup (line 26)
    subg = io.parseGroup(filename, g_info);

Error in nwbRead (line 20)
nwb = io.parseGroup(filename, info);

1) The data should be read passively and it appears that this is not, causing RAM overload. 2) The data should be int16 and it appears that this is being cast as a double 3) It looks like the shape might be missing and the array flattened. The error describes the data as 4036910000x1, but it should be 50461375x80.

With a little keyboard mode exploration I found g_info.Datasets(1).Datatype.Class: 'H5T_INTEGER' and .Type: 'H5T_STD_I16LE', which looks right to me, so matnwb appears to be reading this information correctly, but maybe not applying it. If the strategy is to import as a double and then recast to the desired type, I think that's going to continue to cause RAM issues and should probably be refactored. g_info.Datasets(1).Dataspace.Size: [80 50461375], which I think should be transposed. NWB is pretty particular about the time dimension always being first.

The file I am trying to import is quite large (6 GB), and I think you might be able to investigate these issues without it, but if you'd like it, let me know the best way to share it with you.

becb9b716629365ee9e78e1326cd1ca107f6a97e

lawrence-mbf commented 6 years ago

Re>1. Yeah it should be a DataStub. IMO, (2) and (3) are dependent issues on this one. Re>2. So the schema actually specifies that ElectricalSeries's data type should be a float (which is mapped to MATLAB's double type) so I'm unsure why it's using int16 values instead.

I'm unable to actually recreate it using matnwb. In g_info.Datasets(1).Dataspace, what is the listed Type?

bendichter commented 6 years ago
struct with fields:

       Size: [80 50461375]
    MaxSize: [80 50461375]
       Type: 'simple'
bendichter commented 6 years ago

Oh I see. Well the original data is stored as an int16 and pynwb doesn't complain as long as you stick with numeric types. I agree that this is a non-issue once we fix 1

lawrence-mbf commented 6 years ago

3adccde3162c9cde09aa3a27642582eb8c0552fa See if this works.

bendichter commented 6 years ago

looks like it went through! Thanks!