AllenInstitute / AllenSDK

code for reading and processing Allen Institute for Brain Science data
https://allensdk.readthedocs.io/en/latest/
Other
344 stars 150 forks source link

Create VBN behavior-only NWB files #2314

Closed wbwakeman closed 2 years ago

wbwakeman commented 2 years ago

As with the 2021 release of Visual Behavior ophys data, the April release of Visual Behavior Neuropixels data will have two types of files: NWB files with Neuropixels (ecephys) data and behavior-only NWB files.

Tasks

Note: We may need to get a new monitor_delay value for the rigs in question.

wbwakeman commented 2 years ago

Nick made a good comment in Slack that his PR is useful for historical purposes but @Adam Amster's Data Object refactor may be more relevant: The mega PR in question: https://github.com/AllenInstitute/AllenSDK/pull/2225 Previous PRs that were part of that 2225 mega PR https://github.com/AllenInstitute/AllenSDK/pull/2104 https://github.com/AllenInstitute/AllenSDK/pull/2153 These may also be useful: http://confluence.corp.alleninstitute.org/display/IT/AllenSDK+Design+Improvements+Brainstorm

We may also want to to think of this as two separate pieces of work:

  1. creating NWB files - may be mostly done, although there is at least one error about monitor_delay referenced in the issue description
  2. Using SDK to access the data in the files for this project
wbwakeman commented 2 years ago

LIMS db query to get the 4041 behavior-only sessions for the 94 mice that are currently planned for the release. This includes training done on a training rig and on an ecephys rig. It excludes the 188 "real" ecephys sessions for which we have ecephys data to release (2 sessions per mouse).

(Note: query updated at 3/9 17:30 to remove four donors which only had v1.0 geometry)

SELECT bs.id AS behavior_session_id, be.name AS equipment_name, d.full_genotype, d.external_donor_name AS mouse_id, 'todo' AS reporter_line, 'todo' AS driver_line, g.name AS sex, 'todo' AS age_in_days_needs_attention 
, g.name AS cre_line_needs_attention, 'todo' as session_number, 'todo' as prior_exposures_to_session_type, 'todo' as prior_exposures_to_image_set, 'todo' as prior_exposures_to_omissions 
, es.id AS ecephys_session_id, p.code AS project_code, bs.date_of_acquisition, 'todo' AS session_type, 'todo' AS file_id
FROM behavior_sessions bs
LEFT JOIN equipment be ON be.id=bs.equipment_id
LEFT JOIN users bu ON bu.id=bs.user_id

LEFT JOIN ecephys_sessions es ON bs.ecephys_session_id = es.id
LEFT JOIN equipment ee ON ee.id=es.equipment_id
LEFT JOIN projects p ON p.id=es.project_id

JOIN donors d ON d.id=bs.donor_id
JOIN genders g ON g.id=d.gender_id
WHERE 
d.id IN (1000324121,1005252690,1006391440,1022743357,1022743363,1023230290,1023232536,1023232770,1024038404,1024039055,1024938124,1026713886,1029486741,1030967622,1033845075,1033846133,1035469403,1038297549,1038299144,1039843634,1042036158,1043723977,1046926079,1049750648,1051359676,1051360928,1051363699,1051366038,1051905332,1051906227,1051920918,1052679314,1052713734,1052749560,1052760035,1053309580,1054702626,1055401572,1056087380,1056087710,1056092845,1057575664,1057598487,1060089748,1061693281,1062711964,1063385030,1064158477,1064933502,1066191445,1066195455,1067599948,1068696543,1070663443,1071683800,1072728313,1072729465,1074838695,1075310738,1076654843,1076711655,1078585800,1078586885,1079572057,1080378213,1080503454,1083934330,1087316944,1087519142,1087544065,1088250937,1090574186,1091250203,1091281239,1091837607,1095656306,1096936276,1097085666,1097696571,1098595953,1099073123,1100036636,1102157407,1103035848,1104573423,1109521182,1113647000,1114222996,1115936367,1134343253)

--AND bs.ecephys_session_id IS NOT NULL -- 916
--AND es.habituation = 'f' -- 188 ecephys
--AND es.habituation = 't' -- 728 "habituation ecephys"
AND (es.habituation = 't' OR es.id IS NULL)

ORDER BY d.id, bs.date_of_acquisition;
wbwakeman commented 2 years ago

My take on the fields needed in an NWB files. May be useful just to double-check not missing anything nwb_field_list.xlsx

Also, feedback from Ben Dichter of the PyNWB project:

I'm excited to see this data, and would be happy to help further once you have some example data for us to look at!

I also want to let you know about a project we are working hard on right now, nwbinspector, a tool to inspect NWB files for compliance with NWB Best Practices. Once you pip-install this, you should be able to run the command line tool and get a report of compliance with best practices. It's still a work-in-progress, but it has come a long way in the last month and at this point I think it would be worth trying out on your own data.

danielsf commented 2 years ago

The ecephys sessions for release are documented here

https://github.com/AllenInstitute/ecephys_etl_pipelines/issues/38

we're not sure which behavior-only sessions these imply

aamster commented 2 years ago

Duplicated by #2559