INM-6 / swan

Swan (Sequential Waveform Analyzer) is an open-source graphical tool for tracking single units across multiple sessions of electrophysiological data that was recorded using chronically implanted microelectrode arrays.
BSD 3-Clause "New" or "Revised" License
5 stars 2 forks source link

New neo IO class for swan #33

Open shashwatsridhar opened 3 years ago

shashwatsridhar commented 3 years ago

Given the recent new interest in Swan, I was trying to get Swan to work with a neo IO class which is compatible with newer versions of neo (currently we only support blackrockio_v4). My idea was to create a pipeline wherein the user converts her data to a common intermediary format that's easy to write to (eg. npy) and then provide a conversion script to convert the intermediary files to a neo compatible file format.

The problem Is, I couldn't find any format in neo that: 1) supports writing blocks, AND 2) supports lazy loading / channel-by-channel loading. For example, nixIO and pickleIO can write blocks just fine, but they cannot be loaded channel-by-channel or lazy loaded. For users with many sessions to analyze, this would quickly become intractable due to memory limitations.

One solution is to have the user convert their data to channel-by-channel intermediary files, and convert data from each channel to single .pkl files. While this would work, it does not seem like a very elegant solution, requiring two conversion steps to get data in a Swan compatible format.

An alternative solution would be to create a SwanNumpyIO class based on neo.BaseFromRaw and neo.BaseRawIO. This would read in folders corresponding to individual sessions, each containing certain required numpy files, and use numpy's memmap functionality to read in data channel-by-channel. This has three advantages that I can see:

1) the users only need to convert their data to numpy (with the structure I propose below), and

2) swan is then relatively independent of neo release cycles, allowing for quicker bug fixes and improvements in data IO

3) data can be quickly loaded channel-by-channel

The numpy format I propose is as follows:

Each session is stored in a folder, whose name corresponds to the dataset name. The folder contains four files:

(I'm still not sure of the precise structure of the metadata.json file)

I have never implemented a neoIO class, so I might be misjudging the complexity of the task itself. I was hoping @JuliaSprenger and @mdenker could share their thoughts and insights here. Do you think it's worth it?

JuliaSprenger commented 3 years ago

Hi @shashwatsridhar. On a first read this sounds like you are falling in the let-me-introduce-yet-another-standard-trap. I think the NixIO_fr might help you as it allows reading Nix files in a lazy mode, if the neo structure is raw-compatible (i.e. has the same number of channels across segments). Since you are also planning to switch to the latest version of the BlackrockIO this would anyway be the case for the data you are going to load in the future. Alternatively if you would like to still separate the different types of data into different files as you describe it above it would make sense to use an existing format (e.g. openephys, exdir) and extend the neo capabilities for that. Here, the first format can be read by neo, but not written yet and the latter one is on the list to be included in neo at some point in the future.