djkapner commented 4 years ago

fixes a bug in writing to experiment_table when query results were not same length.
nails down how I was running many segmentation jobs at once with an array job and a driver file (segmentation manifest)
adds a module to create a segmentation manifest from an experiment_table entry
adds a README to slapp.data_selection

Following the README:

$ python -m slapp.data_selection.select_data --input_json test_input.json 
INFO:DataSelector:SELECT oe.id as exp_id, idep.depth, genotypes.name FROM ophys_experiments as oe JOIN ophys_sessions as os on oe.ophys_session_id=os.id JOIN projects on projects.id=os.project_id JOIN imaging_depths as idep on idep.id = oe.imaging_depth_id JOIN specimens on os.specimen_id=specimens.id JOIN donors on specimens.donor_id=donors.id JOIN donors_genotypes dg ON dg.donor_id=donors.id JOIN genotypes ON genotypes.id=dg.genotype_id JOIN genotype_types as gt on gt.id=genotypes.genotype_type_id WHERE gt.name='driver' AND oe.workflow_state = 'passed' AND projects.code='VisualBehaviorMultiscope' AND genotypes.name='Sst-IRES-Cre' AND idep.depth BETWEEN 100 and 300
 returned 171 ids
INFO:DataSelector:SELECT oe.id as exp_id, idep.depth, genotypes.name FROM ophys_experiments as oe JOIN ophys_sessions as os on oe.ophys_session_id=os.id JOIN projects on projects.id=os.project_id JOIN imaging_depths as idep on idep.id = oe.imaging_depth_id JOIN specimens on os.specimen_id=specimens.id JOIN donors on specimens.donor_id=donors.id JOIN donors_genotypes dg ON dg.donor_id=donors.id JOIN genotypes ON genotypes.id=dg.genotype_id JOIN genotype_types as gt on gt.id=genotypes.genotype_type_id WHERE gt.name='driver' AND oe.workflow_state = 'passed' AND projects.code='VisualBehaviorTask1B' AND genotypes.name='Sst-IRES-Cre' AND idep.depth BETWEEN 100 and 300
 returned 49 ids
INFO:DataSelector:SELECT oe.id as exp_id, idep.depth, genotypes.name FROM ophys_experiments as oe JOIN ophys_sessions as os on oe.ophys_session_id=os.id JOIN projects on projects.id=os.project_id JOIN imaging_depths as idep on idep.id = oe.imaging_depth_id JOIN specimens on os.specimen_id=specimens.id JOIN donors on specimens.donor_id=donors.id JOIN donors_genotypes dg ON dg.donor_id=donors.id JOIN genotypes ON genotypes.id=dg.genotype_id JOIN genotype_types as gt on gt.id=genotypes.genotype_type_id WHERE gt.name='driver' AND oe.workflow_state = 'passed' AND projects.code='VisualBehaviorMultiscope' AND genotypes.name='Slc17a7-IRES2-Cre' AND idep.depth BETWEEN 100 and 200
 returned 99 ids
INFO:DataSelector:SELECT oe.id as exp_id, idep.depth, genotypes.name FROM ophys_experiments as oe JOIN ophys_sessions as os on oe.ophys_session_id=os.id JOIN projects on projects.id=os.project_id JOIN imaging_depths as idep on idep.id = oe.imaging_depth_id JOIN specimens on os.specimen_id=specimens.id JOIN donors on specimens.donor_id=donors.id JOIN donors_genotypes dg ON dg.donor_id=donors.id JOIN genotypes ON genotypes.id=dg.genotype_id JOIN genotype_types as gt on gt.id=genotypes.genotype_type_id WHERE gt.name='driver' AND oe.workflow_state = 'passed' AND projects.code='VisualBehaviorTask1B' AND genotypes.name='Slc17a7-IRES2-Cre' AND idep.depth BETWEEN 100 and 200
 returned 47 ids
INFO:DataSelector:sub-selected ids [851093291, 977247468, 866518324, 853363749, 865798237, 951980481, 982903847, 986518870, 977978331, 986518863, 982903843, 957759564, 978296102, 988707128, 960995086, 867410514, 988707124, 867410520, 978296114, 953659743, 977247476, 867410518, 960995077, 982903853, 977978329, 853363739, 987317107, 856123119, 856123117, 953659752, 982344777, 853988446, 871196369, 976300303, 866518318, 871196375, 957759568, 866518326, 960995084, 868870094, 871196377, 850517344, 986518852, 866518314, 957759562, 956941841, 959388794, 989212489, 959388790, 976300297, 987317101, 864967106, 866518316, 977978327, 951980479, 854759894, 866518293, 871196365, 850517352, 875786885, 853988430, 864430668, 956941848, 868870092, 857698006, 868870085, 959388798, 977247474, 865798247, 958527479, 864967102, 988707126, 977247472, 873963899, 864430666, 994053903, 1002314807, 989191384, 1003771249, 1010812025, 986402309, 979668410, 960960480, 993344860, 957652800, 1001535125, 978827848, 993862620, 994061182, 1003456269, 1012112426, 995439942, 994790561, 984551228, 993593393, 919419001, 974433390, 889806727, 905955238, 889806719, 908381674, 932381896, 908381700, 914107592, 990681006, 886585126, 974433399, 909184300, 914580660, 989610989, 915243090, 972233193, 916220450, 905955228, 929603796, 887386949, 904363934, 886003523, 989610985, 929603805, 901559828, 935440149, 971761068, 934456506, 1011751579, 906877227, 932333410, 972683314, 994278291, 915141818]
INFO:DataSelector:results added to postgres table

created entry 6:

the next step in the README, building a segmentation manifest:

$ python -m slapp.data_selection.segmentation_manifest --experiment_selection_id 6 --output_json ./example_output.json --log_level INFO
INFO:SegmentationManifest:selected 130 experiments from table experiment_selection
INFO:SegmentationManifest:wrote ./example_output.json

and

$ head -n 15 example_output.json 
{
  "manifest": [
    {
      "input_video": "/allen/programs/braintv/production/neuralcoding/prod0/specimen_813702151/ophys_session_849304162/ophys_experiment_850517344/processed/motion_corrected_video.h5",
      "log_level": "ERROR",
      "experiment_id": 850517344,
      "nbinned": 420
    },
    {
      "input_video": "/allen/programs/braintv/production/neuralcoding/prod0/specimen_813702151/ophys_session_849304162/ophys_experiment_850517352/processed/motion_corrected_video.h5",
      "log_level": "ERROR",
      "experiment_id": 850517352,
      "nbinned": 420
    },

codecov-io commented 4 years ago

Codecov Report

Merging #101 into master will increase coverage by 0.62%. The diff coverage is 96.61%.

@@            Coverage Diff             @@
##           master     #101      +/-   ##
==========================================
+ Coverage   90.18%   90.80%   +0.62%     
==========================================
  Files          11       13       +2     
  Lines         550      609      +59     
==========================================
+ Hits          496      553      +57     
- Misses         54       56       +2

Impacted Files	Coverage Δ
slapp/data_selection/segmentation_manifest.py	`94.87% <94.87%> (ø)`
slapp/data_selection/select_data.py	`100.00% <100.00%> (ø)`
slapp/data_selection/utils.py	`100.00% <100.00%> (ø)`

Continue to review full report at Codecov.

Legend - Click here to learn more Δ = absolute <relative> (impact), ø = not affected, ? = missing data Powered by Codecov. Last update 85bacb6...1de52e9. Read the comment docs.

kschelonka commented 4 years ago

I got confused reading this document. I think part of the reason is because it assumes that we already know a lot of information about stuff in ophys_segmentation and the structure of our labeling database. I appreciate and like how specific you get in the commands, but I'd like to see more summary information to help guide the reader, and talk about the structure of the examples. It's easy to get stuck in these long query strings.

For example, for the Experiment Selection section, I think it would help the reader not get lost if you outlined the process before providing code examples. Something like:

The select_data module handles data selection. Based on the input_json passed, it queries the LIMS database to retrieve a list of the appropriate experiment_ids. For reproducibility, it dumps all the IDs and some metadata in an entry in the experiment_selection table in our Labeling Database. The entry contains the following fields: < table of fields and explanation>

... Or something like that.

AllenInstitute / segmentation-labeling-app

Feature/segmentation driver file #101

Codecov Report