AllenInstitute / visual_behavior_analysis

Python package for analyzing behavioral data for Brain Observatory: Visual Behavior
Other
21 stars 6 forks source link

mismatch in event timestamp length #700

Open dougollerenshaw opened 3 years ago

dougollerenshaw commented 3 years ago

The length of the timestamp array in the dataset events dataframe does not match the length of the filtered_events array.

For example:

from visual_behavior.data_access import loading
oeid = 953443028
dataset = loading.get_ophys_dataset(oeid)
len(dataset.events.iloc[0]['timestamps'])

140014

len(dataset.events.iloc[0]['filtered_events'])

140012

Note that the length of the ophyst_timestamps attribute matches the length of the filtered_events attribute, so it would seem that the 'timestamps' attribute of the events dataframe is the outlier.

len(dataset.ophys_timestamps)

140012

dougollerenshaw commented 3 years ago

I just checked another session and it's also off by two:

from visual_behavior.data_access import loading
oeid = 958435448
dataset = loading.get_ophys_dataset(oeid)
len(dataset.events.iloc[0]['timestamps'])

140058

len(dataset.events.iloc[0]['filtered_events'])

140056

len(dataset.ophys_timestamps)

140056

dougollerenshaw commented 3 years ago

One more clue: the timestamps align at the beginning of the array, but not at the end. That'd seem to imply that the extra two timestamps in dataset.events.iloc[0]['timestamps'] are at the end: image

dougollerenshaw commented 3 years ago

@matchings are those timestamps inherited directly from the array that @ledochowitsch is saving to disk?

matchings commented 3 years ago

@dougollerenshaw Yes. For now, don't use the timestamps in the events df. Use dataset.ophys_timestamps. Those are the ground truth from SDK. I am not sure what could be causing the timestamps in the event detection output to be off.

dougollerenshaw commented 3 years ago

More evidence that the extra two timestamps at the end are extraneous:

from visual_behavior.data_access import loading
import matplotlib.pyplot as plt

oeid = 953443028
dataset = loading.get_ophys_dataset(oeid)

fig,ax=plt.subplots()
ax.plot(
    dataset.ophys_timestamps,
    dataset.events.iloc[4]['filtered_events']
)

Plotting with last two timestamps trimmed off:

ax.plot(
    dataset.events.iloc[4]['timestamps'][:-2], #trim off last two
    dataset.events.iloc[4]['filtered_events'],
    linestyle = ':',
    linewidth = 3
)

ax.set_xlim(352,353)
ax.set_ylim(-0.01,0.05)

gives us two aligned traces:

image

But trimming off the first two gives us misaligned traces:

fig,ax=plt.subplots()
ax.plot(
    dataset.ophys_timestamps,
    dataset.events.iloc[4]['filtered_events']
)

ax.plot(
    dataset.events.iloc[4]['timestamps'][2:], # trim off first two
    dataset.events.iloc[4]['filtered_events'],
    linestyle = ':',
    linewidth = 3
)

ax.set_xlim(352,353)
ax.set_ylim(-0.01,0.05)

image

ledochowitsch commented 3 years ago

Hey guys,

Which time stamps are you guys talking about?

Note that once you load the npz file, there are different sets of time_stamps in there:

there is npz[‘ts’], which should be identical to what you get from the SDK because it’s just the result of

dataset = loading.get_ophys_dataset(eval(exp_id), include_invalid_rois=False) ts = dataset.timestamps.ophys_frames.values[0]

Then inside the event_dict object there is the key ‘ts’, which contains the time stamps for the detected events, for each cell id: npz[‘event_dict’][cid][‘ts’]

Those time stamps are computed by upsampling the original time stamps:

ts30Hz = resample_poly(ts, uf, 1)

figuring out where there are events in the upsampled events trace:

event_idx = np.where(event30Hz>0)[0] #upsampled

and finally indexing with that into the upsampled time stamp trace:

event_ts = ts30Hz[event_idx]

What appears to be wrong?

Best,

-Peter

From: Doug Ollerenshaw notifications@github.com Date: Tuesday, December 22, 2020 at 1:53 PM To: AllenInstitute/visual_behavior_analysis visual_behavior_analysis@noreply.github.com Cc: Peter Ledochowitsch peterl@alleninstitute.org, Mention mention@noreply.github.com Subject: Re: [AllenInstitute/visual_behavior_analysis] mismatch in event timestamp length (#700)

@matchingshttps://nam12.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fmatchings&data=04%7C01%7C%7Cfc79e335927f4cbf81f808d8a6c3f0a8%7C32669cd6737f4b398bddd6951120d3fc%7C0%7C0%7C637442707798159803%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=Pl6kzVpYQR1E0Pzry3eehhm6izl06Cy20TW7fTxBqwc%3D&reserved=0 are those timestamps inherited directly from the array that @ledochowitschhttps://nam12.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fledochowitsch&data=04%7C01%7C%7Cfc79e335927f4cbf81f808d8a6c3f0a8%7C32669cd6737f4b398bddd6951120d3fc%7C0%7C0%7C637442707798159803%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=E%2F8FDVT97NctRXoD4XA%2B%2BVvzxuVBQsJlM11FpBPihQY%3D&reserved=0 is saving to disk?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://nam12.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2FAllenInstitute%2Fvisual_behavior_analysis%2Fissues%2F700%23issuecomment-749793515&data=04%7C01%7C%7Cfc79e335927f4cbf81f808d8a6c3f0a8%7C32669cd6737f4b398bddd6951120d3fc%7C0%7C0%7C637442707798169796%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=SqLklaUnYSYF%2Fx2pApAvmKj4MNLBkhWsm8lHR9JJQtg%3D&reserved=0, or unsubscribehttps://nam12.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fnotifications%2Funsubscribe-auth%2FABVLHVVFJY5SOTKX7RBGHMTSWEILPANCNFSM4VGEZ56A&data=04%7C01%7C%7Cfc79e335927f4cbf81f808d8a6c3f0a8%7C32669cd6737f4b398bddd6951120d3fc%7C0%7C0%7C637442707798169796%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=d9dXLbvSWSORLLvclbekw5PN6Y%2FV5tGj340WRizlPwE%3D&reserved=0.

matchings commented 3 years ago

i believe the timestamps Doug is talking about are the npz['ts'] ones, not the ones in the event_dict.

ledochowitsch commented 3 years ago

That’s very mysterious then – I’m just passing those through…

-Peter

From: Marina notifications@github.com Date: Tuesday, December 22, 2020 at 3:44 PM To: AllenInstitute/visual_behavior_analysis visual_behavior_analysis@noreply.github.com Cc: Peter Ledochowitsch peterl@alleninstitute.org, Mention mention@noreply.github.com Subject: Re: [AllenInstitute/visual_behavior_analysis] mismatch in event timestamp length (#700)

i believe the timestamps Doug is talking about are the npz['ts'] ones, not the ones in the event_dict.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://nam12.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2FAllenInstitute%2Fvisual_behavior_analysis%2Fissues%2F700%23issuecomment-749838709&data=04%7C01%7C%7C850c8dca95c347320fd608d8a6d39506%7C32669cd6737f4b398bddd6951120d3fc%7C0%7C0%7C637442774967952327%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=binKSuGQjl4az7spjvVkX%2FpH2FGUcFuTB3anhy9O2Xw%3D&reserved=0, or unsubscribehttps://nam12.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fnotifications%2Funsubscribe-auth%2FABVLHVRRHO2IKNTCFEMOILLSWEVPLANCNFSM4VGEZ56A&data=04%7C01%7C%7C850c8dca95c347320fd608d8a6d39506%7C32669cd6737f4b398bddd6951120d3fc%7C0%7C0%7C637442774967962324%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=DQWehU1JH5bpm4bZv4kK22%2BHKQbeXtXGqo6Agm2d2zo%3D&reserved=0.

dougollerenshaw commented 3 years ago

Thanks @ledochowitsch. Sorry for being unclear about the underlying issue. I was initially struggling to understand it myself so this issue got a little muddied.

But the fundamental issue is this: the timestamps associated with the events are two values longer than the events arrays themselves.

For the same oeid I initially referenced in this issue, here's what happens when I go back to the cached events file:

import numpy as np
events_file = '//allen/programs/braintv/workgroups/nc-ophys/visual_behavior/event_detection/953443028.npz'
f = np.load(events_file, allow_pickle=True)

Get the length of the timestamps array: len(f['ts'])

140014

Get the length of an events trace:

event_dict = f['event_dict'].item()
cell_roi_ids = list(event_dict.keys())
len(event_dict[cell_roi_ids[0]]['event_trace'])

140012

Above you said:

there is npz[‘ts’], which should be identical to what you get from the SDK

When I go back to the directly to the SDK, I get this:

from allensdk.brain_observatory.behavior.behavior_ophys_session import BehaviorOphysSession
oeid = 953443028
session = BehaviorOphysSession.from_lims(oeid)
len(session.ophys_timestamps)

140012

But you went on to say:

...because it’s just the result of

dataset = loading.get_ophys_dataset(eval(exp_id), include_invalid_rois=False) ts = dataset.timestamps.ophys_frames.values[0]

Checking that myself, I see:

from visual_behavior.data_access import loading
dataset = loading.get_ophys_dataset(oeid, include_invalid_rois=False)
len(dataset.timestamps.ophys_frames['timestamps'])

140014

So it'd seem that the dataset.timestamps.ophys_frames['timestamps'] attribute is the source of the confusion here. @matchings, do you know where that attribute is coming from and why it would be two elements longer than ophys_timestamps?

ledochowitsch commented 3 years ago

Ah, I see... That’s frustrating:/.

The good news is that the contents of npz[‘events’] will be unaffected by this issue. However, the events time stamps will be off by the same two samples…

-Peter

From: Doug Ollerenshaw notifications@github.com Date: Tuesday, December 22, 2020 at 3:48 PM To: AllenInstitute/visual_behavior_analysis visual_behavior_analysis@noreply.github.com Cc: Peter Ledochowitsch peterl@alleninstitute.org, Mention mention@noreply.github.com Subject: Re: [AllenInstitute/visual_behavior_analysis] mismatch in event timestamp length (#700)

Thanks @ledochowitschhttps://nam12.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fledochowitsch&data=04%7C01%7C%7Cb7b765066ca84b5e4ff408d8a6d422fb%7C32669cd6737f4b398bddd6951120d3fc%7C0%7C0%7C637442777354408371%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=ZgLQwOfTjPxjyki3EGk1qXA591cQbt8SeZr%2FR2sEJbs%3D&reserved=0. Sorry for being unclear about the underlying issue. I was initially struggling to understand it myself so this issue got a little muddied.

But the fundamental issue is this: the timestamps associated with the events are two values longer than the events arrays themselves.

For the same oeid I initially referenced in this issue, here's what happens when I go back to the cached events file:

import numpy as np

events_file = '//allen/programs/braintv/workgroups/nc-ophys/visual_behavior/event_detection/953443028.npz'

f = np.load(events_file, allow_pickle=True)

Get the length of the timestamps array: len(f['ts'])

140014

Get the length of an events trace:

event_dict = f['event_dict'].item()

cell_roi_ids = list(event_dict.keys())

len(event_dict[cell_roi_ids[0]]['event_trace'])

140012

Above you said:

there is npz[‘ts’], which should be identical to what you get from the SDK

When I go back to the directly to the SDK, I get this:

from allensdk.brain_observatory.behavior.behavior_ophys_session import BehaviorOphysSession

oeid = 953443028

session = BehaviorOphysSession.from_lims(oeid)

len(session.ophys_timestamps)

140012

But you went on to say:

...because it’s just the result of

dataset = loading.get_ophys_dataset(eval(exp_id), include_invalid_rois=False) ts = dataset.timestamps.ophys_frames.values[0]

Checking that myself, I see:

from visual_behavior.data_access import loading

dataset = loading.get_ophys_dataset(oeid, include_invalid_rois=False)

len(dataset.timestamps.ophys_frames['timestamps'])

140014

So it'd seem that the dataset.timestamps.ophys_frames['timestamps'] attribute is the source of the confusion here. @matchingshttps://nam12.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fmatchings&data=04%7C01%7C%7Cb7b765066ca84b5e4ff408d8a6d422fb%7C32669cd6737f4b398bddd6951120d3fc%7C0%7C0%7C637442777354418365%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=ZSbaUfWGQq0FMMcvQ5sOoFVjLPKDxZ91ZwudNSCKJ3A%3D&reserved=0, do you know where that attribute is coming from and why it would be two elements longer than ophys_timestamps?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://nam12.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2FAllenInstitute%2Fvisual_behavior_analysis%2Fissues%2F700%23issuecomment-749839540&data=04%7C01%7C%7Cb7b765066ca84b5e4ff408d8a6d422fb%7C32669cd6737f4b398bddd6951120d3fc%7C0%7C0%7C637442777354418365%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=M7FILrl55SbMxpX3c%2Br4cmXGHEuGCGsAb4WS4ivFiZA%3D&reserved=0, or unsubscribehttps://nam12.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fnotifications%2Funsubscribe-auth%2FABVLHVWS6QH6SPUKRLN7AHDSWEV6HANCNFSM4VGEZ56A&data=04%7C01%7C%7Cb7b765066ca84b5e4ff408d8a6d422fb%7C32669cd6737f4b398bddd6951120d3fc%7C0%7C0%7C637442777354418365%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=ZGTdRXsyBEGGz0NnbTIaH%2FbL3wR%2FmQEQDE04O%2Fd32W0%3D&reserved=0.

matchings commented 3 years ago

dataset.timestamps.ophys_frames['timestamps'] are computed directly from the sync file, and these are what is used to create dataset.ophys_timestamps for mesoscope experiments because the SDK does not yet do the proper time resampling for mesoscope (or at least it didnt in the version we are using). For scientifica, dataset.ophys_timestamps is pulled directly from the SDK. If the SDK is doing some truncation of frames, it could lead to a discrepancy between dataset.ophys_timestamps and dataset.timestamps.ophys_frames['timestamps']. But that should be specific to Scientifica, because mesoscope uses the same thing for both. I hope that makes sense...

dougollerenshaw commented 3 years ago

Thanks @matchings. It looks like you're correct that this is specific to scientifica sessions. Here's an example from mesoscope showing that both the SDK ophys_timestamps and the VBA dataset ophys_frames['timestamps'] vectors are the same length:

image

And here's a different 2P3 (scientifica) session with the same off-by-two error as above:

image

So does this mean that the problem is with the SDK? If so, we should submit an SDK issue to solve it. These discrepancies will undoubtedly confuse other users in the future.

matchings commented 3 years ago

im guessing that the SDK truncates the timestamps to match the ophys traces, which is probably a desired behavior, otherwise we would have mismatches all over the place. i believe the scientificas are known to give out a few extra TTL pulses at the end of the session (or at least MPE says its at the end, its nice that you just validated that here), which we want to remove so that everything is aligned. it surprises me that you are always seeing an off by exactly 2 issue though, because i thought the number of those extra pulses at the end was variable.

alexpiet commented 3 years ago

I'm always paranoid about convolutions. Should the mode be "valid" instead of "full" (default)?

https://github.com/AllenInstitute/visual_behavior_analysis/blob/0b07d4657b80431b328122efc6ef60122306b654/visual_behavior/ophys/response_analysis/response_processing.py#L422

https://numpy.org/doc/stable/reference/generated/numpy.convolve.html

ledochowitsch commented 3 years ago

I had in fact double-checked that the tine stamps were the same length as the dff traces when I prototyped the code - on MesoScope data. Unfortunately, I have not re-checked when I generalized it to also work for Scientifica. Who would have thunk?

-Peter

Get Outlook for iOShttps://aka.ms/o0ukef


From: Alex Piet notifications@github.com Sent: Tuesday, December 22, 2020 5:24:38 PM To: AllenInstitute/visual_behavior_analysis visual_behavior_analysis@noreply.github.com Cc: Peter Ledochowitsch peterl@alleninstitute.org; Mention mention@noreply.github.com Subject: Re: [AllenInstitute/visual_behavior_analysis] mismatch in event timestamp length (#700)

I'm always paranoid about convolutions. Should the mode be "valid" instead of "full" (default)?

https://github.com/AllenInstitute/visual_behavior_analysis/blob/0b07d4657b80431b328122efc6ef60122306b654/visual_behavior/ophys/response_analysis/response_processing.py#L422https://nam12.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2FAllenInstitute%2Fvisual_behavior_analysis%2Fblob%2F0b07d4657b80431b328122efc6ef60122306b654%2Fvisual_behavior%2Fophys%2Fresponse_analysis%2Fresponse_processing.py%23L422&data=04%7C01%7C%7C76ecd160462f405b878f08d8a6e184b3%7C32669cd6737f4b398bddd6951120d3fc%7C0%7C0%7C637442834815889075%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=AzxIbsGJ1RMg83%2FrhUZllzsar3fURg5ZMig%2FkkZdS4A%3D&reserved=0

https://numpy.org/doc/stable/reference/generated/numpy.convolve.htmlhttps://nam12.safelinks.protection.outlook.com/?url=https%3A%2F%2Fnumpy.org%2Fdoc%2Fstable%2Freference%2Fgenerated%2Fnumpy.convolve.html&data=04%7C01%7C%7C76ecd160462f405b878f08d8a6e184b3%7C32669cd6737f4b398bddd6951120d3fc%7C0%7C0%7C637442834815889075%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=zmKiDtZO8N%2BSeN%2F%2FBvxACPpxTARuK8OIsgYb3XDkl%2FE%3D&reserved=0

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://nam12.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2FAllenInstitute%2Fvisual_behavior_analysis%2Fissues%2F700%23issuecomment-749866117&data=04%7C01%7C%7C76ecd160462f405b878f08d8a6e184b3%7C32669cd6737f4b398bddd6951120d3fc%7C0%7C0%7C637442834815899059%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=WmpaNR%2BJwbY79gvnfN9SeW6PL2ovW14ph0LYgSntKE0%3D&reserved=0, or unsubscribehttps://nam12.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fnotifications%2Funsubscribe-auth%2FABVLHVTZHSQI2Z2C75DBWY3SWFBFNANCNFSM4VGEZ56A&data=04%7C01%7C%7C76ecd160462f405b878f08d8a6e184b3%7C32669cd6737f4b398bddd6951120d3fc%7C0%7C0%7C637442834815909057%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=bdIjw5xHzDU4EAwr7ZmUKnP6VrsBE9u5oo2Fg0rfEr0%3D&reserved=0.