catalystneuro / dombeck-lab-to-nwb

NWB Conversion project for the Dombeck lab at Northwestern University.
MIT License
0 stars 0 forks source link

Azcorra2023 - Fiber photometry conversion notes #1

Open weiglszonja opened 4 months ago

weiglszonja commented 4 months ago

Azcorra2023 Conversion notes

Simultaneous traces (velocity from rotary encoder, trigger signals for reward, air puff and light stimuli delivery, licking from a lick sensor, fluorescence detected by photomultiplier tubes (PMTs) from one or two optic fibers and output from waveform generator used to alternate 405-nm and 470-nm illumination every 10 ms) were collected at 4 kHz by a PicoScope 6 data acquisition system.

Folder structure

Recordings are organized in folders indicating the batch of mice. Within them, each mouse has its own folder, with further folders inside, one per recording as saved by Picoscope. Recordings are labelled with the date, so a set of recordings in one session will all be labelled with the date and a number.

Picoscope default folder structure

Each set of recording folders must be inside a folder named as 'experiment type-mouse ID' - example 'VGlut-A997'. All recording sets from the same mouse should be in this same folder.

Each recording in a set from a single mouse and session should be named as 'date-recording number', where recording number = number within the recording set and is 4 digits long: yyyymmdd-00##.

2020-02-26 Vglut2/  # experiment folder
└── VGlut-A997/  # subject folder
    ├── 20200129-0001/  # session folder
    │   ├── 20200129-0001_01.mat  # individual recordings that belong to the same session
    │   ├── 20200129-0001_02.mat
    │   ├── ...
    │   ├── 20200129-0001_20.mat
    │   ├── 20200129-0001_21.mat

Picoscope data

The variables in Picoscope (e.g. 20200129-0001_01.mat) are as follows:

Variable A (velocity) is in units of Volts and can be converted to m/s using the conversion factor of 0.6766 (code)

% convert velocity to m/s
data4.data{rec}.chMov{depth} = data4.data{rec}.chMov{depth}*convFactor; %ms-1/V - see Maite LabBook 2020-21 p63

Variables D, F, G, H are binary signals (threshold 0.05), E is used to separate the fluorescence due to 405 vs 470 nm illumination.

There is only one set of time data for all channels and this is loaded in one of two possible formats: · A start time, an interval and a length. The variables are named Tstart, Tinterval and Length. · An array of times (sometimes used for ETS data). The time array is named T. If the times are loaded in as Tstart, Tinterval and Length then you can use the following command to create the equivalent array of times: T = [Tstart : Tinterval : Tstart + (Length – 1) * Tinterval];

View traces

Traces from 2020-02-26 Vglut2/VGlut-A997/20200205-0001

Note:

These signals look more like a control signal than a physiological response, so where are the raw traces then? From their manuscript:

"Fluorescence collected during 405-nm or 470-nm illumination (20 time bins for each pulse of 405-nm or 470-nm excitation) was separated using the binary output from the waveform generator. For each transition period between illumination sources, five time bins were excluded to remove transition times. Traces were then re-binned to 100 Hz by averaging every 40 time bins for velocity and every 40 time bins for 405-nm and 470-nm fluorescence traces"

From discussion with @alessandratrapani :

So probably if we look at the traces in each "flat" part of the squared wave, they correspond to the raw fluorescence signal: when C is in the high flat part it should be what we see in ChGreen405, when C is in the low flat part it should be the signal we have in ChGreen (470) (of course looking at the same time period)

I'll double check this, if we are right then the data from the PicoScope is going to be added as ElectricalSeries with the description above.

Concatenated recordings

Based on this preprocessing script, they are concatenating the raw recordings (fluorescence and behavior) and separating the fluorescence from 470 nm vs 405 nm (isosbestic control) illumination, and saving it as a binned file (re-bins the data from 4 kHz to 100 Hz). The script outputs a .mat file named as 'Binned405_(experiment)-(mouseID)-(recording date yyyymmdd).mat' with a T named structure in it.

Note: I'll double check, but the data from the binned files "ChRed", "ChGreen" is going to be added as raw fluorescence traces.

Picoscope -> "binned" variable names

dict(
  A="chMov",
  B="chRed",
  C="chGreen",
  D="Light",
  E="ch470",
  F="Reward",
  G="Licking",
  H="AirPuff",
)

The separated fluorescence is saved to "chRed405" and "chGreen405" variables. Example snippet: Depth = depth per recording in set

Screenshot 2024-02-20 at 15 08 41 (2)

Processed recordings

The following steps are done to raw recordings based on their analysis code:

  1. Concatenate raw data collected on Picoscope, and separate the fluorescence due to 405 vs 470 nm illumination (405 nm is GCaMP's isosbestic point and thus serves as a movement control), and re-bins the data from 4 kHz to 100 Hz.
  2. Calculate DF/F from raw fluorescence by correcting for fiber/setup/tissue autofluorescence and bleaching, and then normalizes transients from 0 to 1 (saving the normalization value to allow reverting it). It also converts velocity from Volts as read out from the rotary encoder to m/s. (see code)
  3. Select behavioral events used to obtain triggered averages (rewards, air puffs, accelerations...)
  4. Determine exclusion criteria for each recording to decide which will be included for analysis (running time, signal-to-noise ratio)

The output is saved to a .mat file (data6.mat) which contains the experiment type, subject identitifier, date of experiment, baseline fluorescence, normalized DF/F (normalised from 0 to 1), also contains the recording location and mouse sex etc.

Variables

fibers = {'chGreen' 'chRed'};

NWB mapping

This is draft based on our assumptions and the SOW. The (?) mark indicates the data have not been shared.

dombeck-Page-1

References

weiglszonja commented 3 months ago

In data6.mat each row corresponds to a single set of session. Here is a snippet how this looks like when loaded in MATLAB:

Screenshot 2024-03-21 at 14 39 14

which has string variables, cell arrays, and tables inside.

When loading data6.mat with h5py I noticed the "data6" structure appears as (1,6) array:

Screenshot 2024-03-21 at 14 30 46 (2)

which suggests a MATLAB-specific encoding scheme.

I used a script to iterate over each row of the table and flatten nested structures (e.g. "data" column which contains table inside) to organise it into a structured format that be read in Python.

However I still cannot access the dataset inside "data" variable which still shows up as:

Screenshot 2024-03-21 at 15 18 03 (2)

I tried an alternative solution to iterate over each row and save the processed data into separate .mat files, where each file corresponds to a row of the table.


% Get the size of the table
[numRows, numColumns] = size(data6_in);

for row = 1:numRows
    data6 = struct();
    for column = 1:numColumns
        val = data6_in{row, column};
        column_name = data6_in.Properties.VariableNames{column};
        if iscell(val)
            % Get the first element of the cell array
            val = val{1};
        end

        % Convert MATLAB string variable to char array
        if isstring(val) || ischar(val)
            val = char(val);
        end

        % Check if the column name is "data"
        if strcmp(column_name, 'data')
            % Check if val is a table
            if istable(val)
                % Flatten the table contents into structArray
                tableRow = struct();
                tableColumns = val.Properties.VariableNames;
                for k = 1:numel(tableColumns)
                    if iscell(val{1, k})
                        % Get the first element of the cell array
                        unpacked_cell = val{1, k}{1};
                        tableRow.(tableColumns{k}) = unpacked_cell;
                    else
                    tableRow.(tableColumns{k}) = val{1, k};
                    end
                end

                data6.data = tableRow;
            else
                data6.(column_name) = val;
            end
        else
            data6.(column_name) = val;
        end

    end

    filename = sprintf('/Volumes/LaCie/CN_GCP/Dombeck/tmp/%s.mat', data6_in.data{row}.Properties.RowNames{1});
    save(filename, 'data6', '-v7.3');
end

Loading this file with h5py and accessing data:

Screenshot 2024-03-21 at 15 36 40 (2)

I would propose to add this as a helper function which should be run in MATLAB before running the conversion, which will create temporary files that can be removed after the converter finished.