catalystneuro / neuroconv

Create NWB files by converting and combining neural data in proprietary formats and adding essential metadata.
https://neuroconv.readthedocs.io
BSD 3-Clause "New" or "Revised" License
51 stars 23 forks source link

[Bug]: get_default_backend_configuration: auto chunk not good for time series data #1099

Open bendichter opened 2 months ago

bendichter commented 2 months ago

What happened?

When using get_default_backend_configuration for long time series, the recommended chunks are similar to the dataset size, which creates very long chunks that are sub-optimal for viewing windows of time e.g. the way data is accessed in neurosift. A better chunking for time series would deviate from the similarity convention, and provide chunks that hold more channels.

Steps to Reproduce

import numpy as np
from pynwb.testing.mock.ecephys import mock_ElectricalSeries
from pynwb.testing.mock.file import mock_NWBFile
from neuroconv.tools.nwb_helpers import get_default_backend_configuration

data = np.ones((10000000,128))

nwbfile = mock_NWBFile()

ts = mock_ElectricalSeries(data=data, nwbfile=nwbfile)
nwbfile

backend_config = get_default_backend_configuration(nwbfile, backend="hdf5")
backend_config.dataset_configurations["acquisition/ElectricalSeries/data"].chunk_shape

output: (312500, 4)

Traceback

No response

Operating System

macOS

Python Executable

Conda

Python Version

3.10

Package Versions

No response

Code of Conduct

h-mayorquin commented 1 week ago

Some relevant discussion here: https://github.com/NeurodataWithoutBorders/pynwb/issues/1945