AllenInstitute / AllenSDK

code for reading and processing Allen Institute for Brain Science data
https://allensdk.readthedocs.io/en/latest/
Other
343 stars 149 forks source link

BehaviorEcephysSession instantiation fails on lick timestamp alignment #2383

Closed danielsf closed 2 years ago

danielsf commented 2 years ago

When trying to instantiate BehaviorEcephysSessions from the input jsons in

/allen/aibs/technology/sergeyg/Projects/vbn/input_json_templates

The 82 files posted below failed with an error like

3177 lick frames; 3180 lick timestamps in the Sync file. Should be equal

We need to investigate why the lick timestamps and lick frames are not aligned, discuss with Corbett how he wants to proceed, and implement a solution

To create a BehaviorEcephysSession, run something like

import pandas as pd
import json
import pathlib
import numpy as np
from allensdk.brain_observatory.ecephys.behavior_ecephys_session import (
    BehaviorEcephysSession)

def main():
    json_path = pathlib.Path('/allen/aibs/technology/sergeyg/Projects/vbn/input_
json_templates/BEHAVIOR_ECEPHYS_WRITE_NWB_QUEUE_1111216934_input.json')
    src_dir = json_path.parent
    json_path_list = [n for n in src_dir.rglob('*input.json')]
    whole_msg = ""
    for json_path in json_path_list:
        with open(json_path, 'rb') as in_file:
            json_data = json.load(in_file)
        session = BehaviorEcephysSession.from_json(session_data=json_data)

The lick timestamp-to-frame alignment is handled here

https://github.com/AllenInstitute/AllenSDK/blob/vbn_2022_dev/allensdk/brain_observatory/ecephys/behavior_ecephys_session.py#L105-L150

Task

danielsf commented 2 years ago

The offending sessions are

BEHAVIOR_ECEPHYS_WRITE_NWB_QUEUE_1061463555_input.json
BEHAVIOR_ECEPHYS_WRITE_NWB_QUEUE_1104058216_input.json
BEHAVIOR_ECEPHYS_WRITE_NWB_QUEUE_1063010496_input.json
BEHAVIOR_ECEPHYS_WRITE_NWB_QUEUE_1056495334_input.json
BEHAVIOR_ECEPHYS_WRITE_NWB_QUEUE_1108335514_input.json
BEHAVIOR_ECEPHYS_WRITE_NWB_QUEUE_1107172157_input.json
BEHAVIOR_ECEPHYS_WRITE_NWB_QUEUE_1062755416_input.json
BEHAVIOR_ECEPHYS_WRITE_NWB_QUEUE_1090800639_input.json
BEHAVIOR_ECEPHYS_WRITE_NWB_QUEUE_1081429294_input.json
BEHAVIOR_ECEPHYS_WRITE_NWB_QUEUE_1096935816_input.json
BEHAVIOR_ECEPHYS_WRITE_NWB_QUEUE_1079018673_input.json
BEHAVIOR_ECEPHYS_WRITE_NWB_QUEUE_1059678195_input.json
BEHAVIOR_ECEPHYS_WRITE_NWB_QUEUE_1104289498_input.json
BEHAVIOR_ECEPHYS_WRITE_NWB_QUEUE_1055415082_input.json
BEHAVIOR_ECEPHYS_WRITE_NWB_QUEUE_1095340643_input.json
BEHAVIOR_ECEPHYS_WRITE_NWB_QUEUE_1098350754_input.json
BEHAVIOR_ECEPHYS_WRITE_NWB_QUEUE_1093638203_input.json
BEHAVIOR_ECEPHYS_WRITE_NWB_QUEUE_1095138995_input.json
BEHAVIOR_ECEPHYS_WRITE_NWB_QUEUE_1098119201_input.json
BEHAVIOR_ECEPHYS_WRITE_NWB_QUEUE_1099869737_input.json
BEHAVIOR_ECEPHYS_WRITE_NWB_QUEUE_1053925378_input.json
BEHAVIOR_ECEPHYS_WRITE_NWB_QUEUE_1047977240_input.json
BEHAVIOR_ECEPHYS_WRITE_NWB_QUEUE_1044385384_input.json
BEHAVIOR_ECEPHYS_WRITE_NWB_QUEUE_1048189115_input.json
BEHAVIOR_ECEPHYS_WRITE_NWB_QUEUE_1116941914_input.json
BEHAVIOR_ECEPHYS_WRITE_NWB_QUEUE_1069461581_input.json
BEHAVIOR_ECEPHYS_WRITE_NWB_QUEUE_1052530003_input.json
BEHAVIOR_ECEPHYS_WRITE_NWB_QUEUE_1055403683_input.json
BEHAVIOR_ECEPHYS_WRITE_NWB_QUEUE_1047969464_input.json
BEHAVIOR_ECEPHYS_WRITE_NWB_QUEUE_1115368723_input.json
BEHAVIOR_ECEPHYS_WRITE_NWB_QUEUE_1061238668_input.json
BEHAVIOR_ECEPHYS_WRITE_NWB_QUEUE_1072572100_input.json
BEHAVIOR_ECEPHYS_WRITE_NWB_QUEUE_1108528422_input.json
BEHAVIOR_ECEPHYS_WRITE_NWB_QUEUE_1053941483_input.json
BEHAVIOR_ECEPHYS_WRITE_NWB_QUEUE_1064415305_input.json
BEHAVIOR_ECEPHYS_WRITE_NWB_QUEUE_1070961372_input.json
BEHAVIOR_ECEPHYS_WRITE_NWB_QUEUE_1077712208_input.json
BEHAVIOR_ECEPHYS_WRITE_NWB_QUEUE_1049514117_input.json
BEHAVIOR_ECEPHYS_WRITE_NWB_QUEUE_1079275221_input.json
BEHAVIOR_ECEPHYS_WRITE_NWB_QUEUE_1059908979_input.json
BEHAVIOR_ECEPHYS_WRITE_NWB_QUEUE_1089296550_input.json
BEHAVIOR_ECEPHYS_WRITE_NWB_QUEUE_1055240613_input.json
BEHAVIOR_ECEPHYS_WRITE_NWB_QUEUE_1065905010_input.json
BEHAVIOR_ECEPHYS_WRITE_NWB_QUEUE_1049273528_input.json
BEHAVIOR_ECEPHYS_WRITE_NWB_QUEUE_1053718935_input.json
BEHAVIOR_ECEPHYS_WRITE_NWB_QUEUE_1076265417_input.json
BEHAVIOR_ECEPHYS_WRITE_NWB_QUEUE_1064639378_input.json
BEHAVIOR_ECEPHYS_WRITE_NWB_QUEUE_1090803859_input.json
BEHAVIOR_ECEPHYS_WRITE_NWB_QUEUE_1052533639_input.json
BEHAVIOR_ECEPHYS_WRITE_NWB_QUEUE_1067588044_input.json
BEHAVIOR_ECEPHYS_WRITE_NWB_QUEUE_1065437523_input.json
BEHAVIOR_ECEPHYS_WRITE_NWB_QUEUE_1052331749_input.json
BEHAVIOR_ECEPHYS_WRITE_NWB_QUEUE_1046581736_input.json
BEHAVIOR_ECEPHYS_WRITE_NWB_QUEUE_1086198651_input.json
BEHAVIOR_ECEPHYS_WRITE_NWB_QUEUE_1081079981_input.json
BEHAVIOR_ECEPHYS_WRITE_NWB_QUEUE_1118508667_input.json
BEHAVIOR_ECEPHYS_WRITE_NWB_QUEUE_1099598937_input.json
BEHAVIOR_ECEPHYS_WRITE_NWB_QUEUE_1072345110_input.json
BEHAVIOR_ECEPHYS_WRITE_NWB_QUEUE_1087720624_input.json
BEHAVIOR_ECEPHYS_WRITE_NWB_QUEUE_1117148442_input.json
BEHAVIOR_ECEPHYS_WRITE_NWB_QUEUE_1048196054_input.json
BEHAVIOR_ECEPHYS_WRITE_NWB_QUEUE_1053709239_input.json
BEHAVIOR_ECEPHYS_WRITE_NWB_QUEUE_1051155866_input.json
BEHAVIOR_ECEPHYS_WRITE_NWB_QUEUE_1106985031_input.json
BEHAVIOR_ECEPHYS_WRITE_NWB_QUEUE_1046166369_input.json
BEHAVIOR_ECEPHYS_WRITE_NWB_QUEUE_1052342277_input.json
BEHAVIOR_ECEPHYS_WRITE_NWB_QUEUE_1069193611_input.json
BEHAVIOR_ECEPHYS_WRITE_NWB_QUEUE_1086433081_input.json
BEHAVIOR_ECEPHYS_WRITE_NWB_QUEUE_1096620314_input.json
BEHAVIOR_ECEPHYS_WRITE_NWB_QUEUE_1044389060_input.json
BEHAVIOR_ECEPHYS_WRITE_NWB_QUEUE_1115086689_input.json
BEHAVIOR_ECEPHYS_WRITE_NWB_QUEUE_1076487758_input.json
BEHAVIOR_ECEPHYS_WRITE_NWB_QUEUE_1101263832_input.json
BEHAVIOR_ECEPHYS_WRITE_NWB_QUEUE_1044597824_input.json
BEHAVIOR_ECEPHYS_WRITE_NWB_QUEUE_1118327332_input.json
BEHAVIOR_ECEPHYS_WRITE_NWB_QUEUE_1091039902_input.json
BEHAVIOR_ECEPHYS_WRITE_NWB_QUEUE_1087992708_input.json
BEHAVIOR_ECEPHYS_WRITE_NWB_QUEUE_1092466205_input.json
BEHAVIOR_ECEPHYS_WRITE_NWB_QUEUE_1092283837_input.json
BEHAVIOR_ECEPHYS_WRITE_NWB_QUEUE_1093867806_input.json
BEHAVIOR_ECEPHYS_WRITE_NWB_QUEUE_1055221968_input.json
BEHAVIOR_ECEPHYS_WRITE_NWB_QUEUE_1067781390_input.json
danielsf commented 2 years ago

I'm attaching the text file with all of the failures from the "instantiate every BehaviorEcephysSession" test, in case you are curious when there were more frames than timestamps and when there were more timestamps than frames. The file is here

danielsf commented 2 years ago

@corbennett

It looks like we may have gotten lucky last week. I'm still getting a lot of cases where len(lick_times) != len(lick_frames).

One thing I was not explicit about, though: we have been getting lick_frames the way we did for VBO, i.e. from the pickle file like

lick_frames = pkl_file['items']['behavior']['lick_sensors'][0]['lick_events']

Is this appropriate? Should we just get lick frames by finding the timestamps of the frames using the ​vsync_stim​ line and then assigning ​lick_times to a frame number using some clever "nearest but not later than" strategy (don't think too hard about what I mean by that, really I'm just asking you "should we be getting lick_frames from the pickle file at all, or should everything be coming from the sync file)?

Addendum: in most cases, the pickle file records fewer licks than the sync file, which I assume just means that the pickle file dropped something and we should trust the sync file. There are a handful of cases in which the pickle file records more licks than the sync file. This seems a trickier situation (unless our policy really is "always trust the sync file; never trust the pickle file").

danielsf commented 2 years ago

in 4/26/2022 conversation with Corbett, he said we should just trust the sync file in all cases; ignore the pickle file for now