Open dougollerenshaw opened 3 years ago
I just checked another session and it's also off by two:
from visual_behavior.data_access import loading
oeid = 958435448
dataset = loading.get_ophys_dataset(oeid)
len(dataset.events.iloc[0]['timestamps'])
140058
len(dataset.events.iloc[0]['filtered_events'])
140056
len(dataset.ophys_timestamps)
140056
One more clue: the timestamps align at the beginning of the array, but not at the end. That'd seem to imply that the extra two timestamps in dataset.events.iloc[0]['timestamps'] are at the end:
@matchings are those timestamps inherited directly from the array that @ledochowitsch is saving to disk?
@dougollerenshaw Yes. For now, don't use the timestamps in the events df. Use dataset.ophys_timestamps. Those are the ground truth from SDK. I am not sure what could be causing the timestamps in the event detection output to be off.
More evidence that the extra two timestamps at the end are extraneous:
from visual_behavior.data_access import loading
import matplotlib.pyplot as plt
oeid = 953443028
dataset = loading.get_ophys_dataset(oeid)
fig,ax=plt.subplots()
ax.plot(
dataset.ophys_timestamps,
dataset.events.iloc[4]['filtered_events']
)
Plotting with last two timestamps trimmed off:
ax.plot(
dataset.events.iloc[4]['timestamps'][:-2], #trim off last two
dataset.events.iloc[4]['filtered_events'],
linestyle = ':',
linewidth = 3
)
ax.set_xlim(352,353)
ax.set_ylim(-0.01,0.05)
gives us two aligned traces:
But trimming off the first two gives us misaligned traces:
fig,ax=plt.subplots()
ax.plot(
dataset.ophys_timestamps,
dataset.events.iloc[4]['filtered_events']
)
ax.plot(
dataset.events.iloc[4]['timestamps'][2:], # trim off first two
dataset.events.iloc[4]['filtered_events'],
linestyle = ':',
linewidth = 3
)
ax.set_xlim(352,353)
ax.set_ylim(-0.01,0.05)
Hey guys,
Which time stamps are you guys talking about?
Note that once you load the npz file, there are different sets of time_stamps in there:
there is npz[‘ts’], which should be identical to what you get from the SDK because it’s just the result of
dataset = loading.get_ophys_dataset(eval(exp_id), include_invalid_rois=False) ts = dataset.timestamps.ophys_frames.values[0]
Then inside the event_dict object there is the key ‘ts’, which contains the time stamps for the detected events, for each cell id: npz[‘event_dict’][cid][‘ts’]
Those time stamps are computed by upsampling the original time stamps:
ts30Hz = resample_poly(ts, uf, 1)
figuring out where there are events in the upsampled events trace:
event_idx = np.where(event30Hz>0)[0] #upsampled
and finally indexing with that into the upsampled time stamp trace:
event_ts = ts30Hz[event_idx]
What appears to be wrong?
Best,
-Peter
From: Doug Ollerenshaw notifications@github.com Date: Tuesday, December 22, 2020 at 1:53 PM To: AllenInstitute/visual_behavior_analysis visual_behavior_analysis@noreply.github.com Cc: Peter Ledochowitsch peterl@alleninstitute.org, Mention mention@noreply.github.com Subject: Re: [AllenInstitute/visual_behavior_analysis] mismatch in event timestamp length (#700)
@matchingshttps://nam12.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fmatchings&data=04%7C01%7C%7Cfc79e335927f4cbf81f808d8a6c3f0a8%7C32669cd6737f4b398bddd6951120d3fc%7C0%7C0%7C637442707798159803%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=Pl6kzVpYQR1E0Pzry3eehhm6izl06Cy20TW7fTxBqwc%3D&reserved=0 are those timestamps inherited directly from the array that @ledochowitschhttps://nam12.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fledochowitsch&data=04%7C01%7C%7Cfc79e335927f4cbf81f808d8a6c3f0a8%7C32669cd6737f4b398bddd6951120d3fc%7C0%7C0%7C637442707798159803%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=E%2F8FDVT97NctRXoD4XA%2B%2BVvzxuVBQsJlM11FpBPihQY%3D&reserved=0 is saving to disk?
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://nam12.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2FAllenInstitute%2Fvisual_behavior_analysis%2Fissues%2F700%23issuecomment-749793515&data=04%7C01%7C%7Cfc79e335927f4cbf81f808d8a6c3f0a8%7C32669cd6737f4b398bddd6951120d3fc%7C0%7C0%7C637442707798169796%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=SqLklaUnYSYF%2Fx2pApAvmKj4MNLBkhWsm8lHR9JJQtg%3D&reserved=0, or unsubscribehttps://nam12.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fnotifications%2Funsubscribe-auth%2FABVLHVVFJY5SOTKX7RBGHMTSWEILPANCNFSM4VGEZ56A&data=04%7C01%7C%7Cfc79e335927f4cbf81f808d8a6c3f0a8%7C32669cd6737f4b398bddd6951120d3fc%7C0%7C0%7C637442707798169796%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=d9dXLbvSWSORLLvclbekw5PN6Y%2FV5tGj340WRizlPwE%3D&reserved=0.
i believe the timestamps Doug is talking about are the npz['ts'] ones, not the ones in the event_dict.
That’s very mysterious then – I’m just passing those through…
-Peter
From: Marina notifications@github.com Date: Tuesday, December 22, 2020 at 3:44 PM To: AllenInstitute/visual_behavior_analysis visual_behavior_analysis@noreply.github.com Cc: Peter Ledochowitsch peterl@alleninstitute.org, Mention mention@noreply.github.com Subject: Re: [AllenInstitute/visual_behavior_analysis] mismatch in event timestamp length (#700)
i believe the timestamps Doug is talking about are the npz['ts'] ones, not the ones in the event_dict.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://nam12.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2FAllenInstitute%2Fvisual_behavior_analysis%2Fissues%2F700%23issuecomment-749838709&data=04%7C01%7C%7C850c8dca95c347320fd608d8a6d39506%7C32669cd6737f4b398bddd6951120d3fc%7C0%7C0%7C637442774967952327%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=binKSuGQjl4az7spjvVkX%2FpH2FGUcFuTB3anhy9O2Xw%3D&reserved=0, or unsubscribehttps://nam12.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fnotifications%2Funsubscribe-auth%2FABVLHVRRHO2IKNTCFEMOILLSWEVPLANCNFSM4VGEZ56A&data=04%7C01%7C%7C850c8dca95c347320fd608d8a6d39506%7C32669cd6737f4b398bddd6951120d3fc%7C0%7C0%7C637442774967962324%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=DQWehU1JH5bpm4bZv4kK22%2BHKQbeXtXGqo6Agm2d2zo%3D&reserved=0.
Thanks @ledochowitsch. Sorry for being unclear about the underlying issue. I was initially struggling to understand it myself so this issue got a little muddied.
But the fundamental issue is this: the timestamps associated with the events are two values longer than the events arrays themselves.
For the same oeid I initially referenced in this issue, here's what happens when I go back to the cached events file:
import numpy as np
events_file = '//allen/programs/braintv/workgroups/nc-ophys/visual_behavior/event_detection/953443028.npz'
f = np.load(events_file, allow_pickle=True)
Get the length of the timestamps array:
len(f['ts'])
140014
Get the length of an events trace:
event_dict = f['event_dict'].item()
cell_roi_ids = list(event_dict.keys())
len(event_dict[cell_roi_ids[0]]['event_trace'])
140012
Above you said:
there is npz[‘ts’], which should be identical to what you get from the SDK
When I go back to the directly to the SDK, I get this:
from allensdk.brain_observatory.behavior.behavior_ophys_session import BehaviorOphysSession
oeid = 953443028
session = BehaviorOphysSession.from_lims(oeid)
len(session.ophys_timestamps)
140012
But you went on to say:
...because it’s just the result of
dataset = loading.get_ophys_dataset(eval(exp_id), include_invalid_rois=False) ts = dataset.timestamps.ophys_frames.values[0]
Checking that myself, I see:
from visual_behavior.data_access import loading
dataset = loading.get_ophys_dataset(oeid, include_invalid_rois=False)
len(dataset.timestamps.ophys_frames['timestamps'])
140014
So it'd seem that the dataset.timestamps.ophys_frames['timestamps']
attribute is the source of the confusion here. @matchings, do you know where that attribute is coming from and why it would be two elements longer than ophys_timestamps
?
Ah, I see... That’s frustrating:/.
The good news is that the contents of npz[‘events’] will be unaffected by this issue. However, the events time stamps will be off by the same two samples…
-Peter
From: Doug Ollerenshaw notifications@github.com Date: Tuesday, December 22, 2020 at 3:48 PM To: AllenInstitute/visual_behavior_analysis visual_behavior_analysis@noreply.github.com Cc: Peter Ledochowitsch peterl@alleninstitute.org, Mention mention@noreply.github.com Subject: Re: [AllenInstitute/visual_behavior_analysis] mismatch in event timestamp length (#700)
Thanks @ledochowitschhttps://nam12.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fledochowitsch&data=04%7C01%7C%7Cb7b765066ca84b5e4ff408d8a6d422fb%7C32669cd6737f4b398bddd6951120d3fc%7C0%7C0%7C637442777354408371%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=ZgLQwOfTjPxjyki3EGk1qXA591cQbt8SeZr%2FR2sEJbs%3D&reserved=0. Sorry for being unclear about the underlying issue. I was initially struggling to understand it myself so this issue got a little muddied.
But the fundamental issue is this: the timestamps associated with the events are two values longer than the events arrays themselves.
For the same oeid I initially referenced in this issue, here's what happens when I go back to the cached events file:
import numpy as np
events_file = '//allen/programs/braintv/workgroups/nc-ophys/visual_behavior/event_detection/953443028.npz'
f = np.load(events_file, allow_pickle=True)
Get the length of the timestamps array: len(f['ts'])
140014
Get the length of an events trace:
event_dict = f['event_dict'].item()
cell_roi_ids = list(event_dict.keys())
len(event_dict[cell_roi_ids[0]]['event_trace'])
140012
Above you said:
there is npz[‘ts’], which should be identical to what you get from the SDK
When I go back to the directly to the SDK, I get this:
from allensdk.brain_observatory.behavior.behavior_ophys_session import BehaviorOphysSession
oeid = 953443028
session = BehaviorOphysSession.from_lims(oeid)
len(session.ophys_timestamps)
140012
But you went on to say:
...because it’s just the result of
dataset = loading.get_ophys_dataset(eval(exp_id), include_invalid_rois=False) ts = dataset.timestamps.ophys_frames.values[0]
Checking that myself, I see:
from visual_behavior.data_access import loading
dataset = loading.get_ophys_dataset(oeid, include_invalid_rois=False)
len(dataset.timestamps.ophys_frames['timestamps'])
140014
So it'd seem that the dataset.timestamps.ophys_frames['timestamps'] attribute is the source of the confusion here. @matchingshttps://nam12.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fmatchings&data=04%7C01%7C%7Cb7b765066ca84b5e4ff408d8a6d422fb%7C32669cd6737f4b398bddd6951120d3fc%7C0%7C0%7C637442777354418365%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=ZSbaUfWGQq0FMMcvQ5sOoFVjLPKDxZ91ZwudNSCKJ3A%3D&reserved=0, do you know where that attribute is coming from and why it would be two elements longer than ophys_timestamps?
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://nam12.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2FAllenInstitute%2Fvisual_behavior_analysis%2Fissues%2F700%23issuecomment-749839540&data=04%7C01%7C%7Cb7b765066ca84b5e4ff408d8a6d422fb%7C32669cd6737f4b398bddd6951120d3fc%7C0%7C0%7C637442777354418365%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=M7FILrl55SbMxpX3c%2Br4cmXGHEuGCGsAb4WS4ivFiZA%3D&reserved=0, or unsubscribehttps://nam12.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fnotifications%2Funsubscribe-auth%2FABVLHVWS6QH6SPUKRLN7AHDSWEV6HANCNFSM4VGEZ56A&data=04%7C01%7C%7Cb7b765066ca84b5e4ff408d8a6d422fb%7C32669cd6737f4b398bddd6951120d3fc%7C0%7C0%7C637442777354418365%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=ZGTdRXsyBEGGz0NnbTIaH%2FbL3wR%2FmQEQDE04O%2Fd32W0%3D&reserved=0.
dataset.timestamps.ophys_frames['timestamps'] are computed directly from the sync file, and these are what is used to create dataset.ophys_timestamps for mesoscope experiments because the SDK does not yet do the proper time resampling for mesoscope (or at least it didnt in the version we are using). For scientifica, dataset.ophys_timestamps is pulled directly from the SDK. If the SDK is doing some truncation of frames, it could lead to a discrepancy between dataset.ophys_timestamps and dataset.timestamps.ophys_frames['timestamps']. But that should be specific to Scientifica, because mesoscope uses the same thing for both. I hope that makes sense...
Thanks @matchings. It looks like you're correct that this is specific to scientifica sessions. Here's an example from mesoscope showing that both the SDK ophys_timestamps and the VBA dataset ophys_frames['timestamps'] vectors are the same length:
And here's a different 2P3 (scientifica) session with the same off-by-two error as above:
So does this mean that the problem is with the SDK? If so, we should submit an SDK issue to solve it. These discrepancies will undoubtedly confuse other users in the future.
im guessing that the SDK truncates the timestamps to match the ophys traces, which is probably a desired behavior, otherwise we would have mismatches all over the place. i believe the scientificas are known to give out a few extra TTL pulses at the end of the session (or at least MPE says its at the end, its nice that you just validated that here), which we want to remove so that everything is aligned. it surprises me that you are always seeing an off by exactly 2 issue though, because i thought the number of those extra pulses at the end was variable.
I'm always paranoid about convolutions. Should the mode be "valid" instead of "full" (default)?
https://numpy.org/doc/stable/reference/generated/numpy.convolve.html
I had in fact double-checked that the tine stamps were the same length as the dff traces when I prototyped the code - on MesoScope data. Unfortunately, I have not re-checked when I generalized it to also work for Scientifica. Who would have thunk?
-Peter
Get Outlook for iOShttps://aka.ms/o0ukef
From: Alex Piet notifications@github.com Sent: Tuesday, December 22, 2020 5:24:38 PM To: AllenInstitute/visual_behavior_analysis visual_behavior_analysis@noreply.github.com Cc: Peter Ledochowitsch peterl@alleninstitute.org; Mention mention@noreply.github.com Subject: Re: [AllenInstitute/visual_behavior_analysis] mismatch in event timestamp length (#700)
I'm always paranoid about convolutions. Should the mode be "valid" instead of "full" (default)?
https://github.com/AllenInstitute/visual_behavior_analysis/blob/0b07d4657b80431b328122efc6ef60122306b654/visual_behavior/ophys/response_analysis/response_processing.py#L422https://nam12.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2FAllenInstitute%2Fvisual_behavior_analysis%2Fblob%2F0b07d4657b80431b328122efc6ef60122306b654%2Fvisual_behavior%2Fophys%2Fresponse_analysis%2Fresponse_processing.py%23L422&data=04%7C01%7C%7C76ecd160462f405b878f08d8a6e184b3%7C32669cd6737f4b398bddd6951120d3fc%7C0%7C0%7C637442834815889075%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=AzxIbsGJ1RMg83%2FrhUZllzsar3fURg5ZMig%2FkkZdS4A%3D&reserved=0
https://numpy.org/doc/stable/reference/generated/numpy.convolve.htmlhttps://nam12.safelinks.protection.outlook.com/?url=https%3A%2F%2Fnumpy.org%2Fdoc%2Fstable%2Freference%2Fgenerated%2Fnumpy.convolve.html&data=04%7C01%7C%7C76ecd160462f405b878f08d8a6e184b3%7C32669cd6737f4b398bddd6951120d3fc%7C0%7C0%7C637442834815889075%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=zmKiDtZO8N%2BSeN%2F%2FBvxACPpxTARuK8OIsgYb3XDkl%2FE%3D&reserved=0
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://nam12.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2FAllenInstitute%2Fvisual_behavior_analysis%2Fissues%2F700%23issuecomment-749866117&data=04%7C01%7C%7C76ecd160462f405b878f08d8a6e184b3%7C32669cd6737f4b398bddd6951120d3fc%7C0%7C0%7C637442834815899059%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=WmpaNR%2BJwbY79gvnfN9SeW6PL2ovW14ph0LYgSntKE0%3D&reserved=0, or unsubscribehttps://nam12.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fnotifications%2Funsubscribe-auth%2FABVLHVTZHSQI2Z2C75DBWY3SWFBFNANCNFSM4VGEZ56A&data=04%7C01%7C%7C76ecd160462f405b878f08d8a6e184b3%7C32669cd6737f4b398bddd6951120d3fc%7C0%7C0%7C637442834815909057%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=bdIjw5xHzDU4EAwr7ZmUKnP6VrsBE9u5oo2Fg0rfEr0%3D&reserved=0.
The length of the timestamp array in the dataset events dataframe does not match the length of the filtered_events array.
For example:
len(dataset.events.iloc[0]['filtered_events'])
Note that the length of the ophyst_timestamps attribute matches the length of the filtered_events attribute, so it would seem that the 'timestamps' attribute of the events dataframe is the outlier.
len(dataset.ophys_timestamps)