Closed wbwakeman closed 3 years ago
Peter has 1537 experiments here: /allen/programs/braintv/workgroups/nc-ophys/visual_behavior/event_detection
on which he has previously run event detection offline. There are 4 genotypes represented, and these have already specific decay times specified in the code:
Slc17a7-IRES2-Cre/wt;Camk2a-tTA/wt;Ai93(TITL-GCaMP6f)/wt 0.34692165705293476
Slc17a7-IRES2-Cre/wt;Camk2a-tTA/wt;Ai94(TITL-GCaMP6s)/wt 0.7375508677946965
Sst-IRES-Cre/wt;Ai148(TIT2L-GC6f-ICL-tTA2)/wt 0.46368295442535684
Vip-IRES-Cre/wt;Ai148(TIT2L-GC6f-ICL-tTA2)/wt 0.5204879281347294
These are the same genotypes for the upcoming VB release. In addition to genotype, we want to make sure that we are reproducing the past results across frame rates.
Here are 4 experiments for spot-checking results against Peter's runs. Due probably to environment differences, the results are slightly different. The "new" results are ones that I have run with a new containerized version of the code that is not dependent on VBA or AllenSDK. It is a stepping-stone to making a production version. The 3 plots are ones that Peter and I made to decide these were close enough results. Left panel: pearsons correlation of event traces (Peter's run vs. new run) Middle panel: integrated event magnitude per cell (Peter vs. new) Right panel: minimum non-zero event magnitude (Peter vs. new)
The conclusion here is that for different genotypes and different frame rates relevant to release, we can reproduce Peter's results. @ledochowitsch @matchings
@wbwakeman this seems to be what the event detection will need to run for one experiment:
{
'experiment_id': 1053774037,
'movie_frame_rate_hz': 31.0,
'full_genotype': 'Vip-IRES-Cre/wt;Ai148(TIT2L-GC6f-ICL-tTA2)/wt',
'ophysdfftracefile': '/allen/programs/braintv/production/visualbehavior/prod4/specimen_1022959551/ophys_session_1053696498/ophys_experiment_1053774037/1053774037_dff.h5',
'output_file': <desired path of output>,
'valid_roi_ids': [1053801242, 1053801240,1053801245, ...]
}
which I am getting (with a couple of extras) by:
SELECT oe.id as experiment_id,
oe.movie_frame_rate_hz,
eq.name as rig,
d.full_genotype as full_genotype,
oe.storage_directory || wkf.filename as OphysDffTraceFile,
array(SELECT cell_rois.id
FROM cell_rois
WHERE cell_rois.valid_roi=True
AND cell_rois.ophys_experiment_id=oe.id) as valid_roi_ids
FROM ophys_experiments as oe
JOIN ophys_sessions as os ON os.id=oe.ophys_session_id
JOIN specimens as sp on sp.id=os.specimen_id
JOIN donors d on d.id = sp.donor_id
JOIN equipment as eq on eq.id = os.equipment_id
JOIN well_known_files as wkf on wkf.attachable_id=oe.id
JOIN well_known_file_types as wkft on wkf.well_known_file_type_id=wkft.id
WHERE wkft.name='OphysDffTraceFile'
AND oe.id IN ({})
Wayne notes that there are valid ROIs associated with an experiment that can have come from previous (non current) segmentation runs. The strategy should provide valid ROI IDs that are present in the OphysDffTraceFile
or, in the event that the ruby strategy can't look into the hdf5 file, the strategy can provide all valid ROI IDs and the module itself will determine which ones in the DFF file are valid.
Before incorporating into ophys_etl, I made some modifications to simplfiy and remove some dependencies, stored here temporarily: https://github.com/AllenInstitute/djkapner_sandbox/tree/master/l0_event_detection
this includes a dockerfile. This dockerfile and l0_analysys_mesoscopy.py
will be the starting point to absorb this code into ophys_etl.
@wbwakeman cc: @matchings There is not yet a specification for what the output of event detection is. Here is my proposal:
Currently, the df/F step creates a well_known_file: OphysDffTraceFile
. This file contains the keys: data
which is an nROI x nframe array. and also roi_names
which is an length = nROI list of ids, lined up with data
I propose that the new event detection module creates a new well_known_file: OphysEventTraceFile
This will be exactly the same format as the dff output format (hdf5), but instead of df/F in the data
field, it will save the events
output of event detection. events
output is also nROI x nframe. It is mostly zeros. Where it is non-zero, it indicates an event magnitude. For non-cells (valid_roi=False) it will be all zeros.
@ledochowitsch will that events
array capture all needed information about event detection? I think it will.
@njmei I imagine that the data access apis will have a function like get_dff_traces
called get_event_traces
. For example
In this case, the output json of this module will be:
{
'OphysEventTraceFile`: <path to written file>
}
Established a version of the code in a new ophys_etl branch and was was able to reproduce above plots using the ophys_etl CI/CD dockerfile.
Job looks like:
#!/bin/bash
#PBS -q braintv
#PBS -l mem=120g
#PBS -l walltime=12:00:00
#PBS -l nodes=1:ppn=32
#PBS -N event_dev
#PBS -j oe
#PBS -o /allen/aibs/informatics/danielk/event_detection/logs
#PBS -t 1-3
inputs=(\
/allen/aibs/informatics/danielk/event_detection/output/792815735_input.json
/allen/aibs/informatics/danielk/event_detection/output/799368904_input.json
/allen/aibs/informatics/danielk/event_detection/output/881003484_input.json
/allen/aibs/informatics/danielk/event_detection/output/955276580_input.json)
image=docker://alleninstitutepika/ophys_etl_pipelines:develop
export SINGULARITY_DOCKER_USERNAME=<>
export SINGULARITY_DOCKER_PASSWORD=<>
TMPDIR=/scratch/capacity/${PBS_JOBID}
SINGULARITY_TMPDIR=${TMPDIR} singularity run \
--bind /allen:/allen,/scratch/capacity/${PBS_JOBID}:/tmp \
${image} \
/envs/event_detection/bin/python -m ophys_etl.transforms.event_detection \
--input_json ${inputs[${PBS_ARRAYID}]}
and one of those input jsons looks like:
{
"movie_frame_rate_hz": 31.0,
"full_genotype": "Vip-IRES-Cre/wt;A",
"ophysdfftracefile": "/allen/programs/braintv/production/visualbehavior/prod0/specimen_744911458/ophys_session_792619807/ophys_experiment_792815735/792815735_dff.h5",
"valid_roi_ids": [858459595, 858459398, 858459609, ... ],
"output_event_file": "/allen/aibs/informatics/danielk/event_detection/output/792815735.npz"
}
@djkapner regarding event detection output and what we want to save - the events .npz files that Peter has created off pipeline includes additional pieces of information beyond the events trace that may be relevant, including a few metrics (noise_stds and lambda value). Another important thing to consider is whether we want to save both the upsampled 30Hz events trace and the upsampled then downsampled 11Hz trace for mesoscope experiments. Generally, i am in favor of saving anything we think we might want to use in the future, so would advocate to keep the traces at both frame rates, and the metrics. We would only provide one trace to users, but we may want to do QC using both traces to validate the resampling, or use the upsampled versions for some analysis requiring higher temporal resolution.
Here is what is in the events .npz file:
events file is an .npz with the following files within it: dff: array of n_cells x n_timepoints with recalculated dF/F values(at original frame rate) ts: timestamps corresponding to timepoints in dff (at original frame rate) events: array of n_cells x n_timepoints with event magnitudes(at original frame rate) noise stds: array of length(n_cells) giving the value for standard deviation of the noise for all ROIs lambdas: array of length(n_cells) giving the lambda value for all ROIs upsampling_factor: factor used to resample mesoscope data into 30Hz time frame event_dict: event_dict contains one item per cell_roi_id, where each item is a dictionary with 4 keys: mag: event magnitude, sampled at 30Hz idx: (i think) indices into original timestamps before resampling ts: timestamps of events, sampled at 30Hz event_trace: trace of event magnitudes, at original frame rate (11Hz for mesoscope)
@matchings It is possible to include all of that, but it will take more work. The reason I was leaning towards not doing it is that building that .npz file requires getting timestamps:
ts = dataset.timestamps.ophys_frames.values
which relies on VBA and VBA processing of sync files. We don't want the production pipeline dependent on VBA, so I would have to absorb that processing. (production code is not dependent on VBA nor on AllenSDK, which is one time-consuming aspect of absorbing code like this.)
I'm wondering if the processing event detection module
should be responsible for producing events
and VBA (and/or AllenSDK perhaps) is responsible for interpreting those events in a larger context. There are tickets in our queue that follow this one that will update AllenSDK APIs to handle the output of this processing module.
That said, I could easily incorporate everything else not related to timestamps. In fact, it is already done (though needs a little cleanup).
I would also advocate for an hdf5 file format, rather than numpy-specific. The AllenSDK API should hide this from you anyway.
I can just update the AllenSDK API ticket with your exposition of the npz file. That ticket needs more specification anyway,
I wasnt suggesting building & providing the exact .npz file, i also prefer hdf5 in the same format as the dff_traces hdf5 file in lims. Also I dont think including the timestamps is necessary.
Is it possible to include events traces at both temporal resolutions for mesoscope without timestamps?
Is it possible to include events traces at both temporal resolutions for mesoscope without timestamps?
yes, it is already there. I'll just shape it up into an hdf5 file.
I would suggest to include the events traces at the lower resolution, and arrays of event time stamps and matching magnitudes (resolution independent).
-Peter
Get Outlook for iOShttps://aka.ms/o0ukef
From: Dan Kapner notifications@github.com Sent: Thursday, January 7, 2021 11:09:02 AM To: AllenInstitute/ophys_etl_pipelines ophys_etl_pipelines@noreply.github.com Cc: Peter Ledochowitsch peterl@alleninstitute.org; Mention mention@noreply.github.com Subject: Re: [AllenInstitute/ophys_etl_pipelines] Create 2p event detection module (#75)
Is it possible to include events traces at both temporal resolutions for mesoscope without timestamps?
yes, it is already there. I'll just shape it up into an hdf5 file.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://nam12.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2FAllenInstitute%2Fophys_etl_pipelines%2Fissues%2F75%23issuecomment-756319226&data=04%7C01%7C%7Ca08e1e396ddd4235caa508d8b33fb2b4%7C32669cd6737f4b398bddd6951120d3fc%7C0%7C0%7C637456433454579610%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=Bb9gG5zdHkX7cY%2BMMnFn4sTRyAShdSou5bHqkMdEvQ4%3D&reserved=0, or unsubscribehttps://nam12.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fnotifications%2Funsubscribe-auth%2FABVLHVTY2ESYUKFWBSQYGELSYYBE5ANCNFSM4SAXC5SA&data=04%7C01%7C%7Ca08e1e396ddd4235caa508d8b33fb2b4%7C32669cd6737f4b398bddd6951120d3fc%7C0%7C0%7C637456433454589603%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=hWPfsIdvkBEbSWX9IC9E3VrIYL3xs3k5azMUH%2Fz9EJE%3D&reserved=0.
@ledochowitsch
I had made a simple test case so the code can run quickly through some tests. This was meant to be something really easy for the algorithm (high SNR, no drift, etc).
I was discussing with @njmei and it looks like the event index is always 1 less than the fake event time that I specified, and, it looks like this was intentional: out[ev['spikes']-1] = ev['pos_spike_mag']
I just want to confirm you want that off-by-one index in there.
Ah! Great catch, Dan. This part of the code was written by Michael O. I believe that the 1-offset was intentional to fix an indexing issue in Sean’s code. 7 months ago, Sean appears to have fixed that issue himself in fast.py, so now it’s being ‘fixed’ twice, which is once too many:
Best,
-Peter
From: Dan Kapner notifications@github.com Date: Friday, January 8, 2021 at 2:44 PM To: AllenInstitute/ophys_etl_pipelines ophys_etl_pipelines@noreply.github.com Cc: Peter Ledochowitsch peterl@alleninstitute.org, Mention mention@noreply.github.com Subject: Re: [AllenInstitute/ophys_etl_pipelines] Create 2p event detection module (#75)
@ledochowitschhttps://nam12.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fledochowitsch&data=04%7C01%7C%7C0ce75f59d26a491cd57c08d8b426f150%7C32669cd6737f4b398bddd6951120d3fc%7C0%7C0%7C637457426656612864%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=iqWeUqA68KGFaJN5HN6UFe7OK4gOydM4KgghD%2BP8J0I%3D&reserved=0 I had made a simple test case so the code can run quickly through some tests. This was meant to be something really easy for the algorithm (high SNR, no drift, etc). [https://camo.githubusercontent.com/ce2c374e8594e84aa2cbb2cd2944240f0c1e3ef06fb56347b9c6cc32551905c0/68747470733a2f2f696d616765732e7a656e68756275736572636f6e74656e742e636f6d2f3564646463633432303637333561303030316461383462622f36653261663638642d393961622d343564312d616165382d626638656139343164393032]https://nam12.safelinks.protection.outlook.com/?url=https%3A%2F%2Fcamo.githubusercontent.com%2Fce2c374e8594e84aa2cbb2cd2944240f0c1e3ef06fb56347b9c6cc32551905c0%2F68747470733a2f2f696d616765732e7a656e68756275736572636f6e74656e742e636f6d2f3564646463633432303637333561303030316461383462622f36653261663638642d393961622d343564312d616165382d626638656139343164393032&data=04%7C01%7C%7C0ce75f59d26a491cd57c08d8b426f150%7C32669cd6737f4b398bddd6951120d3fc%7C0%7C0%7C637457426656622859%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=nvxxFzxpP0aOZac%2BTW2EG8rKQpB1MpUGY43OwSMXmI4%3D&reserved=0 I was discussing with @njmeihttps://nam12.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fnjmei&data=04%7C01%7C%7C0ce75f59d26a491cd57c08d8b426f150%7C32669cd6737f4b398bddd6951120d3fc%7C0%7C0%7C637457426656622859%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=%2FXCJWlmPOebV8XL7iXF0OhSJ74c9hCPT9L0HsPNS5RA%3D&reserved=0 and it looks like the event index is always 1 less than the fake event time that I specified, and, it looks like this was intentional: out[ev['spikes']-1] = ev['pos_spike_mag']
I just want to confirm you want that off-by-one index in there.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://nam12.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2FAllenInstitute%2Fophys_etl_pipelines%2Fissues%2F75%23issuecomment-757034941&data=04%7C01%7C%7C0ce75f59d26a491cd57c08d8b426f150%7C32669cd6737f4b398bddd6951120d3fc%7C0%7C0%7C637457426656632852%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=3A9dCR7dxeGgEBHbk%2FqYKMXvwfpAv3MKtoWAGgW6Us4%3D&reserved=0, or unsubscribehttps://nam12.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fnotifications%2Funsubscribe-auth%2FABVLHVSKWADTRRKPSCKYKADSY6DELANCNFSM4SAXC5SA&data=04%7C01%7C%7C0ce75f59d26a491cd57c08d8b426f150%7C32669cd6737f4b398bddd6951120d3fc%7C0%7C0%7C637457426656632852%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=Q3mpryBmDkEatVGxY6gIXAs2o4yZYdCSYZjzxwX1rok%3D&reserved=0.
I was thinking "close enough" but, it wasn't passing @njmei 's sniff test. Thanks Nick!
There's a fair amount of hand-wringing in the code about whether trace values have NaN's. I checked the 1537 experiments in Peter's output directory. For every valid ROI in 1536 / 1537 experiments, there are no NaNs. For one experiment: 1038171965, the whole file is NaNs - clearly some problem with this one.
I plan to trim back checking for and returning NaNs, as, this appears not to be present in production data coming out of the dff computation queue.
PR is in review with informatics team. I need further discussion with @ledochowitsch about the impacts of upsampling mesoscope data.
For a broader data review and comparison to past results, I chose 54 experiments across cre line and rig. These are the 2 oldest and 2 newest experiments from Peter's previous results. I had to throw out some older experiments that have run through an older version of the DFF queue. Those can be recovered, if desired, by rerunning the DFF queue, but, the DFF traces themselves are slightly different. For comparing event detection results to previous results, I thought that DFF change muddied the water and removed them.
full_genotype rig
Slc17a7-IRES2-Cre/wt;Camk2a-tTA/wt;Ai93(TITL-GCaMP6f)/wt CAM2P.3 [1042639858, 1042895986, 942043482, 862848066]
CAM2P.4 [1051011761, 1051216600, 796105304, 783927872]
CAM2P.5 [965218395, 964460159]
MESO.1 [1040982830, 1041171734, 965980160, 886585130]
Slc17a7-IRES2-Cre/wt;Camk2a-tTA/wt;Ai94(TITL-GCaMP6s)/wt CAM2P.3 [958681110, 950852107]
CAM2P.4 [889777243, 896160394]
CAM2P.5 [977937585, 974945686, 903403819, 904155140]
Sst-IRES-Cre/wt;Ai148(TIT2L-GC6f-ICL-tTA2)/wt CAM2P.3 [1052391266, 1051217227, 990381322, 979668410]
CAM2P.4 [1044446322, 1044678595, 993862620, 994790561]
CAM2P.5 [967005387, 960351917, 965930965, 965228792]
MESO.1 [951980477, 959388788, 866518326, 871196365]
Vip-IRES-Cre/wt;Ai148(TIT2L-GC6f-ICL-tTA2)/wt CAM2P.3 [1043833869, 1047030135, 965928394, 960953590]
CAM2P.4 [1053605418, 1053774037, 830700800, 830093338]
CAM2P.5 [809497730, 803736273, 805784331, 808621958]
MESO.1 [1052877371, 1052877376, 936500606, 935514362]
The new module runs successfully through all of these.
I am proposing a modified strategy for mesoscope/11Hz data. I'll use this selection of data to present that proposal and its impacts at the QC-a-thon on Thursday.
I am noticing now that there are no examples of Slc17a7-IRES2-Cre/wt;Camk2a-tTA/wt;Ai94(TITL-GCaMP6s)/wt
+ MESO.1
, but I don't see that combo in the release dataset either.
This is a notebook for exploring event detection comparisons from the experiment to the trace to the event level:
/allen/aibs/informatics/danielk/event_detection/event_detection_comparison.ipynb
Tasks:
FastLZeroSpikeInference
pretty modular, so that updating to new emerging methods in the future is not too onerous.