Create Visual Coding NWB2 file and document challenges

wbwakeman commented 3 years ago

This work will support creation and release of NWB2 files for 4 different Visual Coding projects (i.e. no behavior streams for the data):

Visual Coding Brain Observatory - the original Brain Observatory data. These experiments had files that were released as NWB version 1
Visual Coding Targeted Experiments - these data never written to any NWB file
Multiscope Signal Noise - these data never written to any NWB file
C600 lateral - these data never written to any NWB file

For Visual Coding targeted experiment 766270826, manually create an NWB2 (schema version 2.2.5) file for the experiment using PyNWB v1.4.

Document the inputs and how it may be necessary to "modify or circumvent" the allensdk.brain_observatory.behavior.write_nwb module for this non-behavior session.

Data files for this experiment exist at:

/allen/programs/braintv/production/neuralcoding/prod58/specimen_741939348/ophys_session_765884228/ophys_experiment_766270826/

Tasks:

[x] Run Visual Behavior NWB generation for the Visual Coding 766270826 experiment
[x] Document differences required to make the file

This is a timeboxed effort for a maximum of 5 days.

wbwakeman commented 3 years ago

LIMS part collects all the inputs: http://stash.corp.alleninstitute.org/projects/TECH/repos/lims/browse/app/strategies/cam_nwb_strategy.rb

The module: http://stash.corp.alleninstitute.org/projects/INF/repos/lims2_modules/browse/CAM/cam_nwb/run_module.rb

Along with other files in: http://stash.corp.alleninstitute.org/projects/INF/repos/lims2_modules/browse/CAM/cam_nwb

Matyasz commented 3 years ago

Here is my review of what prevents the current pipeline form producing NWB2 files from existing visual coding data

The argschema input failures: When run with the existing data that I could find, using the input.json that existed for another job in this queue, we are missing:

{
  "session_data": {
    "behavior_session_id": [
      "Missing data for required field."
    ],
    "foraging_id": [
      "Missing data for required field."
    ],
    "events_file": [
      "Missing data for required field."
    ],
    "ophys_session_id": [
      "Missing data for required field."
    ],
    "eye_tracking_filepath": [
      "Missing data for required field."
    ],
    "imaging_plane_group": [
      "Missing data for required field."
    ],
    "plane_group_count": [
      "Missing data for required field."
    ],
    "eye_tracking_rig_geometry": [
      "Missing data for required field."
    ],
    "segmentation_mask_image_file": [
      "Unknown field."
    ]
  }
}

Errors also occur (in the asserts in /behavior/write_nwb/__main__.py) that check that the json_session is the same as both the lims_ and nwb_sessions:

Lims_session: AssertionError: _average_projection on <allensdk.brain_observatory.behavior.behavior_ophys_experiment.BehaviorOphysExperiment object at 0x7f39bf18c470> did not equal _average_projection on <allensdk.brain_observatory.behavior.behavior_ophys_experiment.BehaviorOphysExperiment object at 0x7f39bf18cc88>
Nwb_session: KeyError: 'behavior_session_id'

In allensdk/brain_observatory/behavior/metadata/behavior_metadata.py: get_task_parameters tries to reference the behavior column of the input data, but there is none.

date_of_acquisition calls get_behavior_session_id, which also throws a KeyError when trying to access self.data['behavior_session_id']

behavior_session_uuid also calls get_behavior_session_id.

In allensdk/brain_observatory/behavior/metadata/behavior_ophys_metadata.py: The following three property methods throw errors:

imaging_plane_group – KeyError: 'imaging_plane_group' in the BehaviorSession class when calling to_dict from the metadata. Specifically, the KeyError occurs in BehaviorOphysJsonExtractor
imaging_plane_group_count – KeyError: 'plane_group_count' also in BehaviorOphysJsonExtractor
ophys_session_id - KeyError: 'ophys_session_id' also in BehaviorOphysJsonExtractor

In allensdk/brain_observatory/behavior/session_apis/data_io/behavior_nwb_api.py: _add_stimulus_templates causes an error at nwb.add_stimulus_template(…):

File "/allen/aibs/technology/conda/shared/miniconda/envs/asdk_dev/lib/python3.6/site-packages/allensdk/brain_observatory/nwb/__init__.py", line 441, in add_stimulus_template
    for image_name, image_data in stimulus_template.items():
AttributeError: 'NoneType' object has no attribute 'items'

The other stimulus methods called in this method also give similar errors

In allensdk/brain_observatory/behavior/session_apis/data_io/behavior_ophys_nwb_api.py: nwb.add_running_acquisition_to_nwbfile goes through BehaviorOphysNwbApi save method and leads to

File "/allen/aibs/technology/conda/shared/miniconda/envs/asdk_dev/lib/python3.6/site-packages/allensdk/brain_observatory/nwb/__init__.py", line 341, in add_running_acquisition_to_nwbfile
    data=running_acquisition_df['dx'].values,
TypeError: 'NoneType' object is not subscriptable

set_omitted_stop_time(stimulus_table=session_object.stimulus_presentations) leads to KeyError: 'omitted' from stimulus_table['omitted']

add_stimulus_presentations has the line stimulus_name_column = get_column_name(stimulus_table.columns, possible_names) which leads to the error KeyError: 'Table expected one name column in intersection, found: []'

nwb.add_trials(nwbfile, session_object.trials, TRIAL_COLUMN_DESCRIPTION_DICT) tries to use trials[['start_time', 'stop_time']] which causes KeyError: "None of [Index(['start_time', 'stop_time'], dtype='object')] are in the [columns]"

nwb.add_task_parameters(nwbfile, session_object.task_parameters) leads to

TypeError: TypeMap.__get_cls_dict.<locals>.__init__: missing argument 'stimulus_distribution', missing argument 'task', missing argument 'reward_volume', missing argument 'n_stimulus_frames', missing argument 'auto_reward_volume', missing argument 'session_type', missing argument 'response_window_sec', missing argument 'blank_duration_sec', missing argument 'stimulus_duration_sec', missing argument 'omitted_flash_fraction', missing argument 'stimulus'

self.add_events(nwbfile=nwbfile, events=session_object.events) leads to KeyError: 'events' when it tries events['events'].

In allensdk/brain_observatory/behavior/session_apis/data_transforms/behavior_data_transforms.py:

In get_licks, lick_frames = (data["items"]["behavior"]["lick_sensors"][0] gives a KeyError: 'behavior'

get_running_speed will call get_running_acquisition_df which will call get_running_df which will try data["items"]["behavior"]["encoders"][0]["vsig"] and fail with a KeyError: 'behavior'

The get_stimulus_presentations method of the BehaviorDataTransforms class calls the get_stimulus_presentations method from /allensdk/brain_observatory/behavior/stimulus_processing/__init__.py which calls get_visual_stimuli_df from the same file and gives a KeyError: 'behavior' when it tries stimuli = data['items']['behavior']['stimuli']

Similarly the get_stimulus_templates class method calls another method of the same name and tries pkl_stimuli = pkl['items']['behavior']['stimuli'] which leads to a KeyError: 'behavior'

The get_trials method also fails.

In allensdk/brain_observatory/behavior/session_apis/data_transforms/behavior_ophys_data_transforms.py:

Same KeyError: 'imaging_plane_group' as earlier in allensdk/brain_observatory/behavior/session_apis/data_io/behavior_ophys_json_api.py get_raw_dff_data tries to read the roi_names field form the DFF h5 file, but it does not have one. It only has a data field.

get_dff_traces calls get_raw_dff_traces which fails as mentioned above.

get_rewards tries pd.DataFrame(data["items"]["behavior"]["trial_log"]), which fails with KeyError: 'behavior'

In get_corrected_fluorescence_traces, the following is raised

if not np.in1d(cell_roi_id_list, corrected_fluorescence_roi_id).all():
raise RuntimeError("cell_specimen_table contains ROI IDs "
"not present in corrected_fluorescence_traces")

get_motion_correction fails because the motion correction .csv does not have any columns named x or y. In fact, it looks like it may not have any column names at all, this is what I get from printing the head of the data frame:

   0  -3.15372  1.81918  -3.15372.1  1.81918.1  0.1  0.2  0.3  0.493704
0  1 -5.171610  1.40385   -5.171610    1.40385    0    0    0  0.469867
1  2 -4.842250  1.43501   -4.842250    1.43501    0    0    0  0.561573
2  3 -2.241590  1.67159   -2.241590    1.67159    0    0    0  0.514286
3  4 -0.356072  1.70972   -0.356072    1.70972    0    0    0  0.554047
4  5 -1.032390  1.08631   -1.032390    1.08631    0    0    0  0.517384

get_events also failed, there is no events_file in the input.json

In allensdk/brain_observatory/nwb/__init__.py:

add_running_speed_to_nwbfile fails because there is no speed column in the running_speed data frame passed in (comes from session_object.reunning_speed, which is actually empty here)

In allensdk/brain_observatory/sync_dataset.py:

get_edges fails because permissive is set to False, and raises the error KeyError: "none of ['lick_times', 'lick_sensor'] were found in this dataset's line labels"

djkapner commented 3 years ago

It appears that many of these things might just be that key names have changed between visual coding and visual behavior. Is that your impression?

Matyasz commented 3 years ago

Yeah, it looks to me like a combination of things changing names, and also the input data being organized totally differently. Like how there are just no column names in the motion correction file, no event detection file at all, and no roi_names in the dff file

Matyasz commented 3 years ago

Here are my findings on the differences between the old and new data formats

data file	differences
events file	file doesn't exist
eye tracking file	file doesn't exist
eye gaze mapping	file doesn't exist
dff file	Old .h5 keys: `['data']` New .h5 keys: `['data', 'num_small_baseline_frames', 'roi_names', 'sigma_dff']`
rigid motion transform file	Old data has no column names, but the mapping is `["index", "x", "y", "a", "b", "c", "d", "e", "f"]` and can be found here. New data columns: `['framenumber', 'x', 'y', 'x_pre_clip', 'y_pre_clip', 'correlation']` These still do not match up, but the only error I encountered when running the pipeline was related to the ‘x’ and ‘y’ columns. The lack of other columns may not pose a problem.
Behavior stimulus file	Old data keys: ['config', 'config_path', 'di', 'do', 'droppedframes', 'fps', 'intervalsms', 'items', 'lims_config', 'miniwindow', 'monitor', 'monitor_brightness', 'monitor_contrast', 'movie_output', 'ni_config', 'nidaq_tasks', 'params', 'platform', 'post_blank_sec', 'pre_blank_sec', 'primary_stimulus', 'script', 'scripttext', 'showmouse', 'start_time', 'startdatetime', 'stimuli', 'stop_time', 'stopdatetime', 'sweepstim_text', 'syncpulse', 'syncpulselines', 'syncpulseport', 'syncsqr', 'syncsqrloc', 'syncsqrsize', 'total_frames', 'trigger_delay_sec', 'unpickleable', 'vsynccount', 'wheight', 'window', 'wwidth'] Old `data[‘items’]` keys: `['sync_square', 'foraging', 'control_stream']` The major issue encountered with this file is the fact that `data['items']` has no `behavior` key New data keys: `['comp_id', 'items', 'platform_info', 'rig_id', 'script', 'session_uuid', 'start_time', 'stop_time', 'threads', 'unpickleable']` New `data[‘items’][‘behavior’]` keys (`behavior` is only key under `items`): `['ai', 'ao', 'auto_update', 'behavior_path', 'behavior_text', 'cl_params', 'config', 'config_path', 'custom_output_path', 'encoders', 'intervalsms', 'items', 'lick_sensors', 'nidaq_tasks', 'omitted_flash_frame_log', 'params', 'rewards', 'rewards_dispensed', 'stimuli', 'sync_pulse', 'trial_count', 'trial_log', 'unpickleable', 'update_count', 'volume_dispensed', 'window']`
sync file	looks good
demixed traces file	looks good

wbwakeman commented 3 years ago

This looks promising.
For the "behavior stimulus file", this Visual Coding data does not have 'behavior' so (theoretically) just don't need any information under data['items']/behavior. For the others, we should be able to get them by processing through the pipeline.

AllenInstitute / AllenSDK

Create Visual Coding NWB2 file and document challenges #2125