AllenInstitute / AllenSDK

code for reading and processing Allen Institute for Brain Science data
https://allensdk.readthedocs.io/en/latest/
Other
340 stars 150 forks source link

Make VBN NWB files ingestible #2330

Closed danielsf closed 2 years ago

danielsf commented 2 years ago

Currently, the NWB files generated by the code in #2319 cannot naively be read in with pynwb. We need to fix that.

Tasks

sgratiy commented 2 years ago

input_json:

/allen/aibs/technology/sergeyg/Projects/vbn/programs/braintv/production/visualbehavior/prod0/specimen_1087519262/ecephys_session_1111216934/BEHAVIOR_ECEPHYS_WRITE_NWB_QUEUE_1111216934_input.json

Issues discovered with constructing EcephysBehaviorSession.from_nwb_path(nwb_path)

1. Missing behavior_session_id input json is missing "behavior_session_id" required by BehaviorMetadata.from_nwb(nwbfile) or by EcephysBehaviorMetadata.from_nwb(nwbfile) that almost entirely duplicates the former class. for debugging I manually added "behavior_session_id": 1234567 to the above input_json

2. Mismatch in the naming of raw running speed data The v_sig, v_in and dx data are added to the NWB file with an old VCN module add_raw_running_data_to_nwbfile(nwbfile, raw_running_data) from nwb_helper.py This module uses different names for these variables though compared to to the RunningAcquisition. This becomes a problem when reading EcephysBehaviorSession.from_nwb because it is instantiating RunningAcquisition object and fails because it cannot find required datasets in the nwb

in VBO this data is added to NWB file with RunningAcquisition.from_json() My attempt to change to using RunningAcquisition.from_json() and corresponding self._running_acquisition.to_nwb(nwbfile=nwbfile) however resulted in error: in running_acq_df = get_running_df(data=stimulus_file.data, time=stimulus_timestamps.value) Something to do with the size mismatch of stimulus_file.data and stimulus_timestamps.value. The hack solution is to update the variable names as I did here add_raw_running_data_to_nwbfile

3. stimulus_file pkl does not have session_iuuid

metadata = BehaviorMetadata.from_nwb(nwbfile)

BehaviorSessionUUID.from_nwb(nwbfile=nwbfile) returns None
that fails expecting an hexadecimal string:
---> 31         id = uuid.UUID(metadata.behavior_session_uuid)
     32         return cls(behavior_session_uuid=id)
     33 

/allen/aibs/technology/sergeyg/miniconda2/envs/asdk36/lib/python3.6/uuid.py in __init__(self, hex, bytes, bytes_le, fields, int, version)
    138             hex = hex.strip('{}').replace('-', '')
    139             if len(hex) != 32:
--> 140                 raise ValueError('badly formed hexadecimal UUID string')
    141             int = int_(hex, 16)
    142         if bytes_le is not None:

ValueError: badly formed hexadecimal UUID string

This happens because behavior_session_uuid is None in the created nwb file. Looks like the stimulus_file pkl is missing session_uuid, so that the getter method below returns None

@classmethod
def from_stimulus_file(
        cls, stimulus_file: StimulusFile) -> "BehaviorSessionUUID":
    id = stimulus_file.data.get('session_uuid')
    if id:
        id = uuid.UUID(id)
    return cls(behavior_session_uuid=id)

4. Stimulus presentations df has NaN values in column 'omitted' and fails on boolean operator.

----> 3 stimuli = Stimuli.from_nwb(nwbfile=nwbfile)
      4 task_parameters = TaskParameters.from_nwb(nwbfile=nwbfile)
      5 trials = TrialTable.from_nwb(nwbfile=nwbfile)

/local1/pyprojects/AllenSDK/allensdk/brain_observatory/behavior/data_objects/stimuli/stimuli.py in from_nwb(cls, nwbfile)
     38     @classmethod
     39     def from_nwb(cls, nwbfile: NWBFile) -> "Stimuli":
---> 40         p = Presentations.from_nwb(nwbfile=nwbfile)
     41         t = Templates.from_nwb(nwbfile=nwbfile)
     42         return Stimuli(presentations=p, templates=t)

/local1/pyprojects/AllenSDK/allensdk/brain_observatory/behavior/data_objects/stimuli/presentations.py in from_nwb(cls, nwbfile)
     79         df = nwbapi.get_stimulus_presentations()
     80 
---> 81         df['is_change'] = is_change_event(stimulus_presentations=df)
     82         df = cls._postprocess(presentations=df, fill_omitted_values=False)
     83         return Presentations(presentations=df)

/local1/pyprojects/AllenSDK/allensdk/brain_observatory/behavior/stimulus_processing.py in is_change_event(stimulus_presentations)
    530 
    531     # exclude omitted stimuli
--> 532     stimuli = stimuli[~stimulus_presentations['omitted']]
    533 
    534     prev_stimuli = stimuli.shift()

/allen/aibs/technology/sergeyg/miniconda2/envs/asdk36/lib/python3.6/site-packages/pandas/core/generic.py in __invert__(self)
   1322             return self
   1323 
-> 1324         new_data = self._mgr.apply(operator.invert)
   1325         result = self._constructor(new_data).__finalize__(self, method="__invert__")
   1326         return result

/allen/aibs/technology/sergeyg/miniconda2/envs/asdk36/lib/python3.6/site-packages/pandas/core/internals/managers.py in apply(self, f, align_keys, **kwargs)
    405 
    406             if callable(f):
--> 407                 applied = b.apply(f, **kwargs)
    408             else:
    409                 applied = getattr(b, f)(**kwargs)

/allen/aibs/technology/sergeyg/miniconda2/envs/asdk36/lib/python3.6/site-packages/pandas/core/internals/blocks.py in apply(self, func, **kwargs)
    344         """
    345         with np.errstate(all="ignore"):
--> 346             result = func(self.values, **kwargs)
    347 
    348         return self._split_op_result(result)

TypeError: bad operand type for unary ~: 'float'
danielsf commented 2 years ago

I think there is a bigger hurdle to making the NWB files ingestible than we thought. This code

https://github.com/AllenInstitute/AllenSDK/blob/29_create_vbn_nwb_module/allensdk/brain_observatory/ecephys/ecephys_behavior_session.py#L195-L196

https://github.com/AllenInstitute/AllenSDK/blob/29_create_vbn_nwb_module/allensdk/brain_observatory/ecephys/ecephys_behavior_session.py#L326

breaks our model in that there is no NWBHelper.from_nwb() implemented at this time. I haven't fully thought about what is going to be needed to actually get EcephysBehaviorSession.from_nwb() to run given this construction, but it is an additional barrier beyond the 4 identified above.

danielsf commented 2 years ago

We have decided to abandon the 29_create_vbn_module branch and re-created the VBN data objects and session objects from a blank slate so that we can be confident that their contents are being written and created correctly.

See #2336