AllenInstitute / ecephys_etl_pipelines

Pipelines and modules for processing extracellular electrophysiology data
Other
1 stars 2 forks source link

Fix omission frame logging in VBN stimulus table #39

Closed danielsf closed 2 years ago

danielsf commented 2 years ago

In this SDK ticket Corbett reports that the VBN data has the same off-by-one-frame error in omission presentations that Farzaneh reported for the VBO in spring 2020. For the purposes of VBN, that issue actually originates in ecephys_etl_pipelines, since the stimulus table is created by the ecephys_etl.modules.vbn_create_stimulus_table module (which, I believe, is called by the EPHYS_NWB_STIMULUS_SUMMARY_V3_QUEUE.

The purpose of this ticket is to fix the bug in omission frame reporting (See write up in the SDK ticket referenced above) so that we can re-run that LIMS queue for the ecephys sessions being released with VBN.

Tasks

Here is a script which can reproduce the original bug as reported in SDK 1539, but reading the data directly from the stimulus csv file.

import pathlib
import json
import numpy as np

from allensdk.brain_observatory.behavior.data_files.stimulus_file import BehaviorStimulusFile
from allensdk.brain_observatory.behavior.data_objects.stimuli.presentations \
    import \
    Presentations
from allensdk.brain_observatory.behavior.data_objects.stimuli.templates \
    import \
    Templates
from allensdk.brain_observatory.behavior.data_objects.stimuli.stimuli import \
    Stimuli

session_id = 1044594870
json_dir = pathlib.Path('/allen/programs/mindscope/workgroups/np-behavior/vbn_data_release/input_jsons')

json_path = json_dir / f'BEHAVIOR_ECEPHYS_WRITE_NWB_QUEUE_{session_id}_input.json'
assert json_path.is_file()

with open(json_path, 'rb') as in_file:
    json_data = json.load(in_file)
stim_table_file = json_data['session_data']['stim_table_file']
stimulus_file = json_data['session_data']['behavior_stimulus_file']

print(f'stimulus_table_file: {stim_table_file}')
print(f'behavior_stimulus_file: {stimulus_file}')

behavior_stimulus_file = BehaviorStimulusFile(filepath=stimulus_file)

stimuli = Stimuli(
    presentations=Presentations.from_path(
      path=stim_table_file,
      exclude_columns=None
    ),
    templates=Templates.from_stimulus_file(
        stimulus_file=behavior_stimulus_file)
)

stimulus_presentations = stimuli.presentations.value

inter_flash_interval = stimulus_presentations.loc[stimulus_presentations['active']].start_time.diff()
active = stimulus_presentations.loc[stimulus_presentations['active']]
print('pre-omission interval: ', inter_flash_interval.values[active['omitted'].astype(bool)].mean())
print('post-omission interval: ', inter_flash_interval.values[np.insert(active['omitted'].astype(bool).values, 
False, 0)[:-1]].mean())
print('standard interval: ', inter_flash_interval.mean())

Which, as of 5/11/2022, produces the following output

stimulus_table_file: /allen/programs/braintv/production/visualbehavior/prod0/specimen_1023232540/ecephys_session_1044594870/VbnCreateStimTableStrategy/analysis_run_1159864663/1044594870_VBN_stimulus_table.csv
behavior_stimulus_file: /allen/programs/braintv/production/visualbehavior/prod0/specimen_1023232540/ecephys_session_1044594870/1044791091/1044594870_524761_20200820.behavior.pkl
pre-omission interval:  0.734256241138249
post-omission interval:  0.7674120710128252
standard interval:  0.7507912380348116
corbennett commented 2 years ago

@danielsf Here's an update: There are two possibilities for how this might happen:

1) omission times are correct, and all the flash times are off by one 2) flash times are correct and the omissions are off by one

We've assumed (2), but digging into this more, I think (1) is what's more likely. To test the timing of the stimulus presentations more carefully, I placed a diode on the screen to measure the actual stimulus display. When I align this data to the stimulus as we do in the current SDK, I get the following plot:

image

Note that the change in screen luminance actually precedes the nominal 'flash time' from the stim table by ~1 frame.

Making the same plot for the flash stimuli from the rf mapping pkl, I get the following: image

This conforms to our expectations (the diode registers a change in the screen very close to when the stim table says we should get one). There's a 3-4 ms delay which I attribute to different parts of the screen rending at different times in the draw cycle.

So, it seems to me that the VBA code that generates the flash times for the behavior stim table is actually incorrectly assigning the stim frames (but correctly assigning the omission frames).

danielsf commented 2 years ago

@corbennett

Can you share the code you used to generate the two plots? I'm concerned that there seems to be a problem in the behavior pickle file but not in the mapping pickle file. Naively, I would have expected the two pickle files to be processed similarly. (Or maybe I am misunderstanding your post).

Doing my own digging, I have found a phenomenological cause for the original problem. In this block of code

https://github.com/AllenInstitute/ecephys_etl_pipelines/blob/main/src/ecephys_etl/data_transformers/visual_behavior_stimulus_processing.py#L159-L163

we increment the start and end frames for visual stimuli presented to the mice.

Nothing similar is done here

https://github.com/AllenInstitute/ecephys_etl_pipelines/blob/main/src/ecephys_etl/data_transformers/visual_behavior_stimulus_processing.py#L190

we just take the frame indices for omissions as-is from the pickle file.

If I either add a += 1 to omitted_flash_frames, or comment out the block of code at lines 161-162, I get the expected pre- and post-omission intervals.

This may actually be consistent with your findings, since only the behavior stimulus block is run through the code I linked to above. Is the right course of action to remove the increment-by-one block at lines 161, 162?

danielsf commented 2 years ago

The bit of code that I would like to change dates back to a 2019 commit to the SDK, specifically this one

https://github.com/AllenInstitute/AllenSDK/commit/5670f087291feef0f3dd8c7f36bd482356019df3

lines 125-129 of

allensdk/brain_observatory/behavior/stimulus_processing.py

It was merged as a part of this PR https://github.com/AllenInstitute/AllenSDK/pull/394

So I'm guessing it was code that was copy-and-pasted over from the visual_behavior_analysis repo. The next step is to determine whether or not visual_behavior_analysis still increments those to frame indices by 1.

They do:

https://github.com/AllenInstitute/visual_behavior_analysis/blob/master/visual_behavior/ophys/dataset/stimulus_processing.py#L189-L194

Curiously, the commit that adds that code to visual_behavior_analysis comes from November 2 2019, which means it post-dates the code in the SDK by about 5 months

https://github.com/AllenInstitute/visual_behavior_analysis/commit/5c6302562e326c50f23f634f17844cde8522475e

see lines 181-185 in

visual_behavior/ophys/dataset/stimulus_processing.py
corbennett commented 2 years ago

TLDR: Scott's fix worked and we should implement it. Then, we should add a +8 ms offset to the stim table to account for the delay between when the top of the screen renders (where the photodiode is placed) and when the center of the screen renders (where our receptive fields are).

@danielsf commented out lines 161, and 162 as suggested above and generated a stimulus table for a photodiode test experiment I ran this week. To test how long it takes the screen to draw, I used two diodes: one at the top of the screen and one in the center. Here are the results:

image

You can see that the timing now looks identical across the behavior, mapping and passive stims. The diode at the top of the screen draws ~8 ms faster than the one in the center, which is expected of LCD screens running at 60 Hz (https://display-corner.epfl.ch/index.php/LCD_dynamics). The top diode is about midway through drawing at the "0" time point, which is also expected, since it is very close to the photodiode used to determine the frame timestamps. This signal is digitized to create the diode sync line, so the '0' timepoint indicates when the diode signal crosses some threshold (and not the very beginning of the draw). This explains why the top of the screen begins drawing just before 0.

I think the stim table should reflect when the center of the screen rendered, since that's the retinotopic region we target. So, I think we should use Scott's fix, but add an 8 ms offset to the rows of the stim table accounting for the lag between draw times for the photodiode and the center of the screen.

Here's the code I used to create these plots:

import os
import pandas as pd
import numpy as np
from allensdk.brain_observatory.sync_dataset import \
    Dataset as SyncDataset
from allensdk.brain_observatory.ecephys.align_timestamps.barcode import extract_barcodes_from_times, get_probe_time_offset
from matplotlib import pyplot as plt

# GET DATA
syncDataset = SyncDataset(r"\\allen\programs\mindscope\workgroups\np-exp\VBN_timing_validation\20220516T183524.h5")
datPath = r"\\allen\programs\mindscope\workgroups\np-exp\VBN_timing_validation\2022-05-16_18-35-30\Record Node 105\experiment1\recording1\continuous\NI-DAQmx-103.0\continuous.dat"
ttlStatesPath = r"\\allen\programs\mindscope\workgroups\np-exp\VBN_timing_validation\2022-05-16_18-35-30\Record Node 105\experiment1\recording1\events\NI-DAQmx-103.0\TTL_1\channel_states.npy"
ttlTimestampsPath = os.path.join(os.path.dirname(ttlStatesPath),'timestamps.npy')
datTimestampsPath = os.path.join(os.path.dirname(datPath),'timestamps.npy')
stim_table = pd.read_csv(r"\\allen\programs\mindscope\workgroups\np-exp\VBN_timing_validation\stim_table\stim_table_220517_1204.csv")

diodeChannels = [0, 2] #analog channels for diode signals, first is center of screen, second is top
ephysSampleRate = 30000
numAnalogCh = 8
datData = np.memmap(datPath,dtype='int16',mode='r')    
datData = np.reshape(datData,(int(datData.size/numAnalogCh),-1)).T
diodeData = {k: datData[k] for k in diodeChannels}

# ALIGN EPHYS DATA TO SYNC
get_edges = lambda key: [syncDataset.get_rising_edges(key, units='seconds'), syncDataset.get_falling_edges(key, units='seconds')]
syncBarcodeRising,syncBarcodeFalling = get_edges('barcode_ephys')
syncBarcodeTimes,syncBarcodes = extract_barcodes_from_times(syncBarcodeRising,syncBarcodeFalling)

datTimestamps = np.load(datTimestampsPath)
fullTimestamps = datTimestamps - datTimestamps[0]

ttlStates = np.load(ttlStatesPath)
ttlTimestamps = np.load(ttlTimestampsPath) - datTimestamps[0]

ephysBarcodeRising = ttlTimestamps[ttlStates>0]/ephysSampleRate
ephysBarcodeFalling = ttlTimestamps[ttlStates<0]/ephysSampleRate
ephysBarcodeTimes,ephysBarcodes = extract_barcodes_from_times(ephysBarcodeRising,ephysBarcodeFalling)

ephysShift,relSampleRate,endpoints = get_probe_time_offset(syncBarcodeTimes,syncBarcodes,ephysBarcodeTimes,ephysBarcodes,0,ephysSampleRate)
fullTimestamps = (fullTimestamps/relSampleRate) - ephysShift

# PLOT DIODE SIGNAL ALIGNED TO TIMESTAMPS
behavior_stim_table = stim_table.loc[stim_table['active']]
mapping_stim_table = stim_table.loc[stim_table['color']==-1]
passive_stim_table = stim_table.loc[stim_table['stimulus_block']==5]

def mean_aligned(stim_times, diodeData):
    aligned_traces = [[] for d in diodeData]
    for stimtime in stim_times:

        start_sample_ind = np.searchsorted(fullTimestamps, stimtime)
        mean_start = start_sample_ind - int(relSampleRate)
        mean_end = start_sample_ind + int(relSampleRate)

        for ichan, chan in enumerate(diodeData):
            aligned_traces[ichan].append(diodeData[chan][mean_start:mean_end])

    means = [np.mean(dt, axis=0) for dt in aligned_traces]
    means = [mean - np.mean(mean[:int(0.8*relSampleRate)]) for mean in means]
    time = np.linspace(-1, 1, len(means[0]))
    return means, time

def plot_diode(stim_times, diodeData, ax, legend=None, title=''):

    means, time = mean_aligned(stim_times, diodeData)
    [ax.plot(time, mean) for mean in means]
    ax.set_xlim([-0.02, 0.05])
    ax.axvline(0, linestyle='--')
    ax.set_xlabel('Time from stim onset (s)')
    ax.set_title(title)
    if legend:
        ax.legend(legend)

fig, axes = plt.subplots(1,3)
fig.set_size_inches([14, 6])
plot_diode(behavior_stim_table.start_time.values, diodeData, axes[0], None, 'behavior stim')
plot_diode(mapping_stim_table.start_time.values, diodeData, axes[1], None, 'mapping stim')
plot_diode(passive_stim_table.start_time.values, diodeData, axes[2], ['center of screen', 'top of screen'], 'passive_stim')
danielsf commented 2 years ago

test_8ms

@corbennett I'm being exceptionally paranoid because that is the kind of person I am.

I interpret your "add +8ms offset to the stimulus table" to mean "take the stimulus table we had and add 8ms to the start_time and end_time columns". The above is what your figure looks like if I do that. It looks like t=0 is now closer to mid-draw for the center of the frame, which I think is what you wanted.

Correct?

corbennett commented 2 years ago

Looks good!