AllenInstitute / ecephys_etl_pipelines

Pipelines and modules for processing extracellular electrophysiology data
Other
1 stars 2 forks source link

Finalize design of Visual Behavior Neuropixels release metadata tables #32

Closed danielsf closed 2 years ago

danielsf commented 2 years ago

The AllenSDK already has an EcephysProjectCache class that produces metadata tables and sessions in support of the October 2019 Visual Coding Neuropixels release.

Having looked at the metadata tables returned by this class, there are a lot of redundancies (columns that appear identically in more than on table when they could/should appear in only one with a foreign key users can join on).

I've been emailing with Corbett about paring down these repeated columns.

Tasks

Validation

This should probably be completed before code is actually written fro #30

danielsf commented 2 years ago

I believe the contents of the metadata tables have been agreed upon. Here is a snippet of an email exchange with Corbett (note that there is a path to the example files in my original email to him)

From Corbett:

I've finally had the chance to look at these metadata tables more closely. I like the structure. As we discussed we'll want to add a couple of flags to the session table, but other than that, this looks good to me. One note on content (not sure if this is meaningful at this stage): looks like the 'prior_exposure_to_image_set' is always 0 in the session table (and 'experience_level' is always 'novel'). In the end, this value should indicate the number of times the mouse has seen that image set during training. For about half of the sessions, this will be 0 (the 'novel' days), but for the other half (the 'familiar' ones), this will be a large number (~30-100).

In any case, sorry if I'm jumping the gun on content. But overall this looks good!

From: Scott Daniel scott.daniel@alleninstitute.org Sent: Friday, March 18, 2022 5:06 PM To: Corbett Bennett corbettb@alleninstitute.org Subject: Re: VBO metadata files

Hi Corbett,

I feel analysis paralysis setting in, so why don’t I show you what I’ve got. I’ve put examples of the stripped-down metadata tables I am proposing in here

/allen/aibs/informatics/danielsf/ecephys_table_examples

cartoon/ contains examples run on a few sessions from this release and a few sessions from outside of the release (so we can see what the ecephys_structure_acronyms columns look like, since those annotations are not complete for the release). realistic/ contains the result of running the same code on what I think the set of sessions for this release is (to get the behavior only sessions, I just queried for all of the behavior sessions associated with the mice in the release that did not have a corresponding ecephys_session_id or ophys_session_id).

The legacy/ directory is the old CSV files I showed you a few weeks ago produced by the Visua Coding Neuropixels release code.

danielsf commented 2 years ago

The tables in that example directory were generated with a combination of scripts. There is an abortive attempt to create a VBNSessionsTable in this branch

https://github.com/AllenInstitute/AllenSDK/tree/ticket/ecephys/32/refactor/sessions

I will list LIMS queries for the other tables below

danielsf commented 2 years ago

Query for units table

query = """
select
eu.id as unit_id
,eu.ecephys_channel_id
,ep.id as ecephys_probe_id
,es.id as ecephys_session_id
,eu.snr
,eu.firing_rate
,eu.isi_violations
,eu.presence_ratio
,eu.amplitude_cutoff
,eu.isolation_distance
,eu.l_ratio
,eu.d_prime
,eu.nn_hit_rate
,eu.nn_miss_rate
,eu.silhouette_score
,eu.max_drift
,eu.cumulative_drift
,eu.duration as waveform_duration
,eu.halfwidth as waveform_halfwidth
,eu.\"PT_ratio\" as waveform_PT_ratio
,eu.repolarization_slope as waveform_repolarization_slope
,eu.recovery_slope as waveform_recovery_slope
,eu.amplitude as waveform_amplitude
,eu.spread as waveform_spread
,eu.velocity_above as waveform_velocity_above
,eu.velocity_below as waveform_velocity_below
,eu.local_index
,ec.probe_vertical_position
,ec.probe_horizontal_position
,ec.anterior_posterior_ccf_coordinate
,ec.dorsal_ventral_ccf_coordinate
,ec.manual_structure_id as ecephys_structure_id
,st.acronym as ecephys_structure_acronym
"""

query += """
FROM ecephys_units as eu
JOIN ecephys_channels as ec on ec.id = eu.ecephys_channel_id
JOIN ecephys_probes as ep on ep.id = ec.ecephys_probe_id
JOIN ecephys_sessions as es on ep.ecephys_session_id = es.id
LEFT JOIN structures as st on st.id = ec.manual_structure_id
"""

query += f"""
WHERE es.id IN {sessions}
"""
danielsf commented 2 years ago

query for channels table

query = """
select
ec.id as ecephys_channel_id
,ec.ecephys_probe_id
,es.id as ecephys_session_id
,ec.local_index
,ec.probe_vertical_position
,ec.probe_horizontal_position
,ec.anterior_posterior_ccf_coordinate
,ec.dorsal_ventral_ccf_coordinate
,ec.left_right_ccf_coordinate
,st.acronym as ecephys_structure_acronym
,count(distinct(eu.id)) as unit_count
"""

query += """
FROM  ecephys_channels as ec
JOIN ecephys_probes as ep on ec.ecephys_probe_id = ep.id
JOIN ecephys_sessions as es on ep.ecephys_session_id = es.id
JOIN ecephys_units as eu on eu.ecephys_channel_id=ec.id
LEFT JOIN structures st on st.id = ec.manual_structure_id"""

query += f"""
WHERE es.id in {sessions}"""

query += """group by ec.id, es.id, st.acronym"""
danielsf commented 2 years ago

query for probes table

query = """
select
ep.id as ecephys_probe_id
,ep.ecephys_session_id
,ep.name
,ep.global_probe_sampling_rate as sampling_rate
,ep.global_probe_lfp_sampling_rate as lfp_sampling_rate
,ep.phase
,ep.use_lfp_data as has_lfp_data
,count(distinct(eu.id)) as unit_count
,count(distinct(ec.id)) as channel_count
,array_agg(distinct(st.acronym)) as ecephys_structure_acronyms"""

query += """
FROM  ecephys_probes as ep
JOIN ecephys_sessions as es on ep.ecephys_session_id = es.id
JOIN ecephys_channels as ec on ec.ecephys_probe_id = ep.id
JOIN ecephys_units as eu on eu.ecephys_channel_id=ec.id
LEFT JOIN structures st on st.id = ec.manual_structure_id"""

query += f"""
WHERE es.id in {sessions}"""

query += """group by ep.id"""
danielsf commented 2 years ago

Obviously, the contents of the tables I sent to Corbett were not correct, but I think the schema has converged.

33 is the ticket where we will actually write the code to generate these tables in production.