Closed danielsf closed 2 years ago
I believe the contents of the metadata tables have been agreed upon. Here is a snippet of an email exchange with Corbett (note that there is a path to the example files in my original email to him)
From Corbett:
I've finally had the chance to look at these metadata tables more closely. I like the structure. As we discussed we'll want to add a couple of flags to the session table, but other than that, this looks good to me. One note on content (not sure if this is meaningful at this stage): looks like the 'prior_exposure_to_image_set' is always 0 in the session table (and 'experience_level' is always 'novel'). In the end, this value should indicate the number of times the mouse has seen that image set during training. For about half of the sessions, this will be 0 (the 'novel' days), but for the other half (the 'familiar' ones), this will be a large number (~30-100).
In any case, sorry if I'm jumping the gun on content. But overall this looks good!
From: Scott Daniel scott.daniel@alleninstitute.org Sent: Friday, March 18, 2022 5:06 PM To: Corbett Bennett corbettb@alleninstitute.org Subject: Re: VBO metadata files
Hi Corbett,
I feel analysis paralysis setting in, so why don’t I show you what I’ve got. I’ve put examples of the stripped-down metadata tables I am proposing in here
/allen/aibs/informatics/danielsf/ecephys_table_examples
cartoon/ contains examples run on a few sessions from this release and a few sessions from outside of the release (so we can see what the ecephys_structure_acronyms columns look like, since those annotations are not complete for the release). realistic/ contains the result of running the same code on what I think the set of sessions for this release is (to get the behavior only sessions, I just queried for all of the behavior sessions associated with the mice in the release that did not have a corresponding ecephys_session_id or ophys_session_id).
The legacy/ directory is the old CSV files I showed you a few weeks ago produced by the Visua Coding Neuropixels release code.
The tables in that example directory were generated with a combination of scripts. There is an abortive attempt to create a VBNSessionsTable
in this branch
https://github.com/AllenInstitute/AllenSDK/tree/ticket/ecephys/32/refactor/sessions
I will list LIMS queries for the other tables below
Query for units table
query = """
select
eu.id as unit_id
,eu.ecephys_channel_id
,ep.id as ecephys_probe_id
,es.id as ecephys_session_id
,eu.snr
,eu.firing_rate
,eu.isi_violations
,eu.presence_ratio
,eu.amplitude_cutoff
,eu.isolation_distance
,eu.l_ratio
,eu.d_prime
,eu.nn_hit_rate
,eu.nn_miss_rate
,eu.silhouette_score
,eu.max_drift
,eu.cumulative_drift
,eu.duration as waveform_duration
,eu.halfwidth as waveform_halfwidth
,eu.\"PT_ratio\" as waveform_PT_ratio
,eu.repolarization_slope as waveform_repolarization_slope
,eu.recovery_slope as waveform_recovery_slope
,eu.amplitude as waveform_amplitude
,eu.spread as waveform_spread
,eu.velocity_above as waveform_velocity_above
,eu.velocity_below as waveform_velocity_below
,eu.local_index
,ec.probe_vertical_position
,ec.probe_horizontal_position
,ec.anterior_posterior_ccf_coordinate
,ec.dorsal_ventral_ccf_coordinate
,ec.manual_structure_id as ecephys_structure_id
,st.acronym as ecephys_structure_acronym
"""
query += """
FROM ecephys_units as eu
JOIN ecephys_channels as ec on ec.id = eu.ecephys_channel_id
JOIN ecephys_probes as ep on ep.id = ec.ecephys_probe_id
JOIN ecephys_sessions as es on ep.ecephys_session_id = es.id
LEFT JOIN structures as st on st.id = ec.manual_structure_id
"""
query += f"""
WHERE es.id IN {sessions}
"""
query for channels table
query = """
select
ec.id as ecephys_channel_id
,ec.ecephys_probe_id
,es.id as ecephys_session_id
,ec.local_index
,ec.probe_vertical_position
,ec.probe_horizontal_position
,ec.anterior_posterior_ccf_coordinate
,ec.dorsal_ventral_ccf_coordinate
,ec.left_right_ccf_coordinate
,st.acronym as ecephys_structure_acronym
,count(distinct(eu.id)) as unit_count
"""
query += """
FROM ecephys_channels as ec
JOIN ecephys_probes as ep on ec.ecephys_probe_id = ep.id
JOIN ecephys_sessions as es on ep.ecephys_session_id = es.id
JOIN ecephys_units as eu on eu.ecephys_channel_id=ec.id
LEFT JOIN structures st on st.id = ec.manual_structure_id"""
query += f"""
WHERE es.id in {sessions}"""
query += """group by ec.id, es.id, st.acronym"""
query for probes table
query = """
select
ep.id as ecephys_probe_id
,ep.ecephys_session_id
,ep.name
,ep.global_probe_sampling_rate as sampling_rate
,ep.global_probe_lfp_sampling_rate as lfp_sampling_rate
,ep.phase
,ep.use_lfp_data as has_lfp_data
,count(distinct(eu.id)) as unit_count
,count(distinct(ec.id)) as channel_count
,array_agg(distinct(st.acronym)) as ecephys_structure_acronyms"""
query += """
FROM ecephys_probes as ep
JOIN ecephys_sessions as es on ep.ecephys_session_id = es.id
JOIN ecephys_channels as ec on ec.ecephys_probe_id = ep.id
JOIN ecephys_units as eu on eu.ecephys_channel_id=ec.id
LEFT JOIN structures st on st.id = ec.manual_structure_id"""
query += f"""
WHERE es.id in {sessions}"""
query += """group by ep.id"""
Obviously, the contents of the tables I sent to Corbett were not correct, but I think the schema has converged.
The AllenSDK already has an
EcephysProjectCache
class that produces metadata tables and sessions in support of the October 2019 Visual Coding Neuropixels release.Having looked at the metadata tables returned by this class, there are a lot of redundancies (columns that appear identically in more than on table when they could/should appear in only one with a foreign key users can join on).
I've been emailing with Corbett about paring down these repeated columns.
Tasks
Validation
This should probably be completed before code is actually written fro #30