AllenInstitute / AllenSDK

code for reading and processing Allen Institute for Brain Science data
https://allensdk.readthedocs.io/en/latest/
Other
346 stars 151 forks source link

Create Visual Coding NWB2 file and document challenges #2125

Closed wbwakeman closed 3 years ago

wbwakeman commented 3 years ago

This work will support creation and release of NWB2 files for 4 different Visual Coding projects (i.e. no behavior streams for the data):

For Visual Coding targeted experiment 766270826, manually create an NWB2 (schema version 2.2.5) file for the experiment using PyNWB v1.4.

Document the inputs and how it may be necessary to "modify or circumvent" the allensdk.brain_observatory.behavior.write_nwb module for this non-behavior session.

Data files for this experiment exist at:

/allen/programs/braintv/production/neuralcoding/prod58/specimen_741939348/ophys_session_765884228/ophys_experiment_766270826/

Tasks:

This is a timeboxed effort for a maximum of 5 days.

wbwakeman commented 3 years ago

LIMS part collects all the inputs: http://stash.corp.alleninstitute.org/projects/TECH/repos/lims/browse/app/strategies/cam_nwb_strategy.rb

The module: http://stash.corp.alleninstitute.org/projects/INF/repos/lims2_modules/browse/CAM/cam_nwb/run_module.rb

Along with other files in: http://stash.corp.alleninstitute.org/projects/INF/repos/lims2_modules/browse/CAM/cam_nwb

Matyasz commented 3 years ago

Here is my review of what prevents the current pipeline form producing NWB2 files from existing visual coding data

The argschema input failures: When run with the existing data that I could find, using the input.json that existed for another job in this queue, we are missing:

{
  "session_data": {
    "behavior_session_id": [
      "Missing data for required field."
    ],
    "foraging_id": [
      "Missing data for required field."
    ],
    "events_file": [
      "Missing data for required field."
    ],
    "ophys_session_id": [
      "Missing data for required field."
    ],
    "eye_tracking_filepath": [
      "Missing data for required field."
    ],
    "imaging_plane_group": [
      "Missing data for required field."
    ],
    "plane_group_count": [
      "Missing data for required field."
    ],
    "eye_tracking_rig_geometry": [
      "Missing data for required field."
    ],
    "segmentation_mask_image_file": [
      "Unknown field."
    ]
  }
}

Errors also occur (in the asserts in /behavior/write_nwb/__main__.py) that check that the json_session is the same as both the lims_ and nwb_sessions:

In allensdk/brain_observatory/behavior/metadata/behavior_metadata.py: get_task_parameters tries to reference the behavior column of the input data, but there is none.

date_of_acquisition calls get_behavior_session_id, which also throws a KeyError when trying to access self.data['behavior_session_id']

behavior_session_uuid also calls get_behavior_session_id.

In allensdk/brain_observatory/behavior/metadata/behavior_ophys_metadata.py: The following three property methods throw errors:

In allensdk/brain_observatory/behavior/session_apis/data_io/behavior_nwb_api.py: _add_stimulus_templates causes an error at nwb.add_stimulus_template(…):

File "/allen/aibs/technology/conda/shared/miniconda/envs/asdk_dev/lib/python3.6/site-packages/allensdk/brain_observatory/nwb/__init__.py", line 441, in add_stimulus_template
    for image_name, image_data in stimulus_template.items():
AttributeError: 'NoneType' object has no attribute 'items'

The other stimulus methods called in this method also give similar errors

In allensdk/brain_observatory/behavior/session_apis/data_io/behavior_ophys_nwb_api.py: nwb.add_running_acquisition_to_nwbfile goes through BehaviorOphysNwbApi save method and leads to

File "/allen/aibs/technology/conda/shared/miniconda/envs/asdk_dev/lib/python3.6/site-packages/allensdk/brain_observatory/nwb/__init__.py", line 341, in add_running_acquisition_to_nwbfile
    data=running_acquisition_df['dx'].values,
TypeError: 'NoneType' object is not subscriptable

set_omitted_stop_time(stimulus_table=session_object.stimulus_presentations) leads to KeyError: 'omitted' from stimulus_table['omitted']

add_stimulus_presentations has the line stimulus_name_column = get_column_name(stimulus_table.columns, possible_names) which leads to the error KeyError: 'Table expected one name column in intersection, found: []'

nwb.add_trials(nwbfile, session_object.trials, TRIAL_COLUMN_DESCRIPTION_DICT) tries to use trials[['start_time', 'stop_time']] which causes KeyError: "None of [Index(['start_time', 'stop_time'], dtype='object')] are in the [columns]"

nwb.add_task_parameters(nwbfile, session_object.task_parameters) leads to

TypeError: TypeMap.__get_cls_dict.<locals>.__init__: missing argument 'stimulus_distribution', missing argument 'task', missing argument 'reward_volume', missing argument 'n_stimulus_frames', missing argument 'auto_reward_volume', missing argument 'session_type', missing argument 'response_window_sec', missing argument 'blank_duration_sec', missing argument 'stimulus_duration_sec', missing argument 'omitted_flash_fraction', missing argument 'stimulus'

self.add_events(nwbfile=nwbfile, events=session_object.events) leads to KeyError: 'events' when it tries events['events'].

In allensdk/brain_observatory/behavior/session_apis/data_transforms/behavior_data_transforms.py:

In get_licks, lick_frames = (data["items"]["behavior"]["lick_sensors"][0] gives a KeyError: 'behavior'

get_running_speed will call get_running_acquisition_df which will call get_running_df which will try data["items"]["behavior"]["encoders"][0]["vsig"] and fail with a KeyError: 'behavior'

The get_stimulus_presentations method of the BehaviorDataTransforms class calls the get_stimulus_presentations method from /allensdk/brain_observatory/behavior/stimulus_processing/__init__.py which calls get_visual_stimuli_df from the same file and gives a KeyError: 'behavior' when it tries stimuli = data['items']['behavior']['stimuli']

Similarly the get_stimulus_templates class method calls another method of the same name and tries pkl_stimuli = pkl['items']['behavior']['stimuli'] which leads to a KeyError: 'behavior'

The get_trials method also fails.

In allensdk/brain_observatory/behavior/session_apis/data_transforms/behavior_ophys_data_transforms.py:

Same KeyError: 'imaging_plane_group' as earlier in allensdk/brain_observatory/behavior/session_apis/data_io/behavior_ophys_json_api.py get_raw_dff_data tries to read the roi_names field form the DFF h5 file, but it does not have one. It only has a data field.

get_dff_traces calls get_raw_dff_traces which fails as mentioned above.

get_rewards tries pd.DataFrame(data["items"]["behavior"]["trial_log"]), which fails with KeyError: 'behavior'

In get_corrected_fluorescence_traces, the following is raised

if not np.in1d(cell_roi_id_list, corrected_fluorescence_roi_id).all():
raise RuntimeError("cell_specimen_table contains ROI IDs "
"not present in corrected_fluorescence_traces")

get_motion_correction fails because the motion correction .csv does not have any columns named x or y. In fact, it looks like it may not have any column names at all, this is what I get from printing the head of the data frame:

   0  -3.15372  1.81918  -3.15372.1  1.81918.1  0.1  0.2  0.3  0.493704
0  1 -5.171610  1.40385   -5.171610    1.40385    0    0    0  0.469867
1  2 -4.842250  1.43501   -4.842250    1.43501    0    0    0  0.561573
2  3 -2.241590  1.67159   -2.241590    1.67159    0    0    0  0.514286
3  4 -0.356072  1.70972   -0.356072    1.70972    0    0    0  0.554047
4  5 -1.032390  1.08631   -1.032390    1.08631    0    0    0  0.517384

get_events also failed, there is no events_file in the input.json

In allensdk/brain_observatory/nwb/__init__.py:

add_running_speed_to_nwbfile fails because there is no speed column in the running_speed data frame passed in (comes from session_object.reunning_speed, which is actually empty here)

In allensdk/brain_observatory/sync_dataset.py:

get_edges fails because permissive is set to False, and raises the error KeyError: "none of ['lick_times', 'lick_sensor'] were found in this dataset's line labels"

djkapner commented 3 years ago

It appears that many of these things might just be that key names have changed between visual coding and visual behavior. Is that your impression?

Matyasz commented 3 years ago

Yeah, it looks to me like a combination of things changing names, and also the input data being organized totally differently. Like how there are just no column names in the motion correction file, no event detection file at all, and no roi_names in the dff file

Matyasz commented 3 years ago

Here are my findings on the differences between the old and new data formats

data file differences
events file file doesn't exist
eye tracking file file doesn't exist
eye gaze mapping file doesn't exist
dff file Old .h5 keys: ['data']
New .h5 keys: ['data', 'num_small_baseline_frames', 'roi_names', 'sigma_dff']
rigid motion transform file Old data has no column names, but the mapping is ["index", "x", "y", "a", "b", "c", "d", "e", "f"] and can be found here.
New data columns: ['framenumber', 'x', 'y', 'x_pre_clip', 'y_pre_clip', 'correlation']

These still do not match up, but the only error I encountered when running the pipeline was related to the ‘x’ and ‘y’ columns. The lack of other columns may not pose a problem.
Behavior stimulus file Old data keys: ['config', 'config_path', 'di', 'do', 'droppedframes', 'fps', 'intervalsms', 'items', 'lims_config', 'miniwindow', 'monitor', 'monitor_brightness', 'monitor_contrast', 'movie_output', 'ni_config', 'nidaq_tasks', 'params', 'platform', 'post_blank_sec', 'pre_blank_sec', 'primary_stimulus', 'script', 'scripttext', 'showmouse', 'start_time', 'startdatetime', 'stimuli', 'stop_time', 'stopdatetime', 'sweepstim_text', 'syncpulse', 'syncpulselines', 'syncpulseport', 'syncsqr', 'syncsqrloc', 'syncsqrsize', 'total_frames', 'trigger_delay_sec', 'unpickleable', 'vsynccount', 'wheight', 'window', 'wwidth']

Old data[‘items’] keys: ['sync_square', 'foraging', 'control_stream']

The major issue encountered with this file is the fact that data['items'] has no behavior key


New data keys: ['comp_id', 'items', 'platform_info', 'rig_id', 'script', 'session_uuid', 'start_time', 'stop_time', 'threads', 'unpickleable']

New data[‘items’][‘behavior’] keys (behavior is only key under items): ['ai', 'ao', 'auto_update', 'behavior_path', 'behavior_text', 'cl_params', 'config', 'config_path', 'custom_output_path', 'encoders', 'intervalsms', 'items', 'lick_sensors', 'nidaq_tasks', 'omitted_flash_frame_log', 'params', 'rewards', 'rewards_dispensed', 'stimuli', 'sync_pulse', 'trial_count', 'trial_log', 'unpickleable', 'update_count', 'volume_dispensed', 'window']
sync file looks good
demixed traces file looks good
wbwakeman commented 3 years ago

This looks promising.
For the "behavior stimulus file", this Visual Coding data does not have 'behavior' so (theoretically) just don't need any information under data['items']/behavior. For the others, we should be able to get them by processing through the pipeline.