Open weiglszonja opened 4 months ago
In data6.mat
each row corresponds to a single set of session.
Here is a snippet how this looks like when loaded in MATLAB:
which has string variables, cell arrays, and tables inside.
When loading data6.mat
with h5py
I noticed the "data6" structure appears as (1,6) array:
which suggests a MATLAB-specific encoding scheme.
I used a script to iterate over each row of the table and flatten nested structures (e.g. "data" column which contains table inside) to organise it into a structured format that be read in Python.
However I still cannot access the dataset inside "data" variable which still shows up as:
I tried an alternative solution to iterate over each row and save the processed data into separate .mat files, where each file corresponds to a row of the table.
% Get the size of the table
[numRows, numColumns] = size(data6_in);
for row = 1:numRows
data6 = struct();
for column = 1:numColumns
val = data6_in{row, column};
column_name = data6_in.Properties.VariableNames{column};
if iscell(val)
% Get the first element of the cell array
val = val{1};
end
% Convert MATLAB string variable to char array
if isstring(val) || ischar(val)
val = char(val);
end
% Check if the column name is "data"
if strcmp(column_name, 'data')
% Check if val is a table
if istable(val)
% Flatten the table contents into structArray
tableRow = struct();
tableColumns = val.Properties.VariableNames;
for k = 1:numel(tableColumns)
if iscell(val{1, k})
% Get the first element of the cell array
unpacked_cell = val{1, k}{1};
tableRow.(tableColumns{k}) = unpacked_cell;
else
tableRow.(tableColumns{k}) = val{1, k};
end
end
data6.data = tableRow;
else
data6.(column_name) = val;
end
else
data6.(column_name) = val;
end
end
filename = sprintf('/Volumes/LaCie/CN_GCP/Dombeck/tmp/%s.mat', data6_in.data{row}.Properties.RowNames{1});
save(filename, 'data6', '-v7.3');
end
Loading this file with h5py
and accessing data
:
I would propose to add this as a helper function which should be run in MATLAB before running the conversion, which will create temporary files that can be removed after the converter finished.
Azcorra2023 Conversion notes
Simultaneous traces (velocity from rotary encoder, trigger signals for reward, air puff and light stimuli delivery, licking from a lick sensor, fluorescence detected by photomultiplier tubes (PMTs) from one or two optic fibers and output from waveform generator used to alternate 405-nm and 470-nm illumination every 10 ms) were collected at 4 kHz by a
PicoScope 6
data acquisition system.Folder structure
Recordings are organized in folders indicating the batch of mice. Within them, each mouse has its own folder, with further folders inside, one per recording as saved by
Picoscope
. Recordings are labelled with the date, so a set of recordings in one session will all be labelled with the date and a number.Picoscope default folder structure
Each set of recording folders must be inside a folder named as 'experiment type-mouse ID' - example 'VGlut-A997'. All recording sets from the same mouse should be in this same folder.
Each recording in a set from a single mouse and session should be named as 'date-recording number', where recording number = number within the recording set and is 4 digits long: yyyymmdd-00##.
Picoscope data
The variables in Picoscope (e.g.
20200129-0001_01.mat
) are as follows:A
= velocity (chMov)B
= fiber 2 fluorescence (from here on called channel red - chR)C
= fiber 1 fluorescence (channel green - chG)D
= light stimulus triggerE
= waveforem generator output indicating illumination wavelength (1 = 470nm, 0 = 405nm)F
= reward delivery triggerG
= licking sensor ourputH
= air puff delivery triggerVariable
A
(velocity) is in units of Volts and can be converted tom/s
using the conversion factor of0.6766
(code)Variables
D
,F
,G
,H
are binary signals (threshold 0.05),E
is used to separate the fluorescence due to 405 vs 470 nm illumination.There is only one set of time data for all channels and this is loaded in one of two possible formats: · A start time, an interval and a length. The variables are named Tstart, Tinterval and Length. · An array of times (sometimes used for ETS data). The time array is named T. If the times are loaded in as Tstart, Tinterval and Length then you can use the following command to create the equivalent array of times: T = [Tstart : Tinterval : Tstart + (Length – 1) * Tinterval];
View traces
Traces from
2020-02-26 Vglut2/VGlut-A997/20200205-0001
Note:
These signals look more like a control signal than a physiological response, so where are the raw traces then? From their manuscript:
From discussion with @alessandratrapani :
So probably if we look at the traces in each "flat" part of the squared wave, they correspond to the raw fluorescence signal: when C is in the high flat part it should be what we see in ChGreen405, when C is in the low flat part it should be the signal we have in ChGreen (470) (of course looking at the same time period)
I'll double check this, if we are right then the data from the PicoScope is going to be added as
ElectricalSeries
with the description above.Concatenated recordings
Based on this preprocessing script, they are concatenating the raw recordings (fluorescence and behavior) and separating the fluorescence from 470 nm vs 405 nm (isosbestic control) illumination, and saving it as a binned file (re-bins the data from 4 kHz to 100 Hz). The script outputs a .mat file named as 'Binned405_(experiment)-(mouseID)-(recording date yyyymmdd).mat' with a
T
named structure in it.Note: I'll double check, but the data from the binned files "ChRed", "ChGreen" is going to be added as raw fluorescence traces.
Picoscope -> "binned" variable names
The separated fluorescence is saved to
"chRed405"
and"chGreen405"
variables. Example snippet:Depth
= depth per recording in setProcessed recordings
The following steps are done to raw recordings based on their analysis code:
Picoscope
, and separate the fluorescence due to 405 vs 470 nm illumination (405 nm is GCaMP's isosbestic point and thus serves as a movement control), and re-bins the data from 4 kHz to 100 Hz.The output is saved to a .mat file (
data6.mat
) which contains the experiment type, subject identitifier, date of experiment, baseline fluorescence, normalized DF/F (normalised from 0 to 1), also contains the recording location and mouse sex etc.Variables
Exp
: experiment nameMouse
: subject identifierDate
: date of experimentdata
: final DF/F, velocity in m/s and acceleration is m/s2depthG
: the depth of the fiber (Green channel)depthR
: the depth of the fiber (Red channel)Day
: 1, 2, or 3Type
: experiment typeRunRew
: whether the animal was running or receiving rewardchG
: the location of fiber?chR
: the location of fiber?Gen
: the sex of the mousecropStart
: the crop point (if 405 channel shows many movement artifacts (rare), exclude recording if the start of the recording is still decaying after the initial correction, crop it off)base
: baseline fluorescencenorm
: normalized DF/Fflip
,dup
: a recording with data for chG and chR will be duplicated and one of the copies with have chG/chR flippedtimeRun
: the time the mouse spent running in each recordingsig2noise
: only use recordings with signal-to-noise ratios above 10exStr
: which recordings are made in axons vs cell bodies (1 = axons, 0 = cell bodies snc/vta, nan = empty)exSig2Noise
: which recordings have high enough signal to noise ratio (1 = good, 0 = bad)exRun
: which recordings have enough running time to include for movement analysis (1 = good, 0 = bad)Acc405
: The max value of cross-correlation between the 405 channel for each fiber (used to determine eligibility for locomotion analysis)ex405Acc
: which recordings must be excluded from locomotion analyhave due to high correlations between the 405 ch and acceleration (1 = good, 0 = bad, nan = not enough running)exBothCh
: which recording pairs have both fibers (chG and chR) with signal-to-noise ratios above thresh, for dual comparisons (1 = good, 0 = bad)exSNcStr
: which recording pairs have one fiber in SNc and one in striatum, for simultaneous comparisons of somas and axons (1 = good, 0 = bad)RG405
: The max value of averaging over cross-correlation between the 405 in one fiber vs the 470 of the other and vice versa (used to determine eligibility for dual omparisons between simultaneous recordings.)exRG405
: which recording must be EXCLUDED from simultaneous fiber-to-fiber comparisons due to high correlation between the 405 and 470 channels of opposite fibers (1 = good, 0 = bad)Bad405All
: which recordings have too many artifacts in the 405 channel (1 = bad, 0 = good, nan = not enough sig2noise)RecLocG
RecLocRmm
: recording location of Red channel in units of mmRecLocGmm
: recording location of Green channel in units of mmRecLocRshift
RecLocGshift
NWB mapping
This is draft based on our assumptions and the SOW. The (?) mark indicates the data have not been shared.
References
Azcorra 2023 manuscript
Raw fiber photometry recordings on Zenodo
Processed fiber photometry recordings on Zenodo
PicoScope manual
Data processing script in MATLAB from Dombeck to concat PicoScope recordings