DFF traces have extremely large and unusual (also negative) values!

farznaj commented 4 years ago

Describe the bug DFF traces of some neurons have extremely large and unusual (also negative) values! I found this for mesoscope experiments.

All the sessions below include neurons with such a problem:

session_id = [839208243, 839514418, 840490733, 842364341, 842623907, 843871999]

The corresponding experiment_ids:
[839716139, 839716141, 839716143, 839716145, 839716147, 839716149,
       839716151, 839716153, 840460366, 840460368, 840460370, 840460372,
       840460376, 840460378, 840460380, 840460383, 840717527, 840717529,
       840717531, 840717534, 840717536, 840717538, 840717540, 840717542,
       842545433, 842545435, 842545437, 842545439, 842545442, 842545444,
       842545446, 842545448, 843007050, 843007052, 843007054, 843007056,
       843007058, 843007061, 843007063, 843007065, 844420229]

To Reproduce

import visual_behavior.data_access.loading as loading import numpy as np import matplotlib.pyplot as plt

experiment_id = 839716153
dataset = loading.get_ophys_dataset(experiment_id, include_invalid_rois=False)
dff = np.vstack(dataset.dff_traces['dff']) # neurons x frames        
plt.plot(dff[2])

Another example

experiment_id = 839716147
dataset = loading.get_ophys_dataset(experiment_id, include_invalid_rois=False)
dff = np.vstack(dataset.dff_traces['dff']) # neurons x frames        
plt.plot(dff[10])

Expected behavior dff values should not be so extreme!

Actual Behavior See the attached figures.

Environment (please complete the following information):

OS & version: Ubuntu 18.04
Python version : e.g. 3.7
AllenSDK version : most recent version

Additional context NA

Do you want to work on this issue? I don't have any available time.

matchings commented 4 years ago

There are actually 2 issues with dF/F traces having negative values. One is as Farzaneh describes above, with extreme outliers being present. A second issue is associated with the way dF/F is computed, using a rolling 3 minute mode to compute the baseline. Using the mode over a 3 minute window assumes that neural activity is sparse, and that the mode will reflect the true baseline (where there is no activity). Given that the Visual Behavior stimulus involves repeated presentations of the same image, if a cell is highly responsive to that image, it will be highly active for a long period of time (potentially minutes), resulting in a mode that is not reflective of true baseline activity in the absence of stimulus. As a result, we need to reconsider this method of baseline calculation. One option is to make the rolling window sufficiently large that it is not possible for the same stimulus to be shown for that long. Another option is to use the 10 minutes of spontaneous activity during the gray screen periods of the experiment to compute a baseline. There may be other options as well.

An additional consideration with dF/F, that I believe also relates to negative values, comes up with event detection - to deal with some artifacts of the current dF/F calculation, the event detection code recalculates dF/F prior to running the event detection algorithm on the traces. Whatever changes were implemented in the event detection version of dF/F should be incorporated into the main dF/F computation, so that there is only one version of dF/F traces.

Apologies if this compounds multiple issues here, I just wanted to lay out all the dF/F related considerations so they are documented.

kschelonka commented 4 years ago

@matchings I wrote a document detailing the dF/F algorithm implemented in the event detection code, in descriptive language and pseudocode. There are some open questions regarding the application to Visual Behavior. Please take a look and let me know if you have any comments/questions: doc (internal link)

wbwakeman commented 4 years ago

@matchings @jeromelecoq @dougollerenshaw @saskiad @farznaj In preparation for the implementation of event detection into the 2p processing pipeline, @kschelonka has prepared a document describing how DF/F will be computed so that large negative events as described in this issue will no longer occur. Could you please review the document and provide feedback as you see fit?

matchings commented 4 years ago

I think it would be helpful if @mabuice also weighed in on this, given his familiarity with the event detection code and the dF/F algorithm used within it.

Here are the key questions listed in the linked document:

How is the r value for neuropil traces created and where is it stored for the visual behavior data?
- This is a question for @mabuice
What is a long enough median filter to accurately compute the baseline? (Current default is 3 minutes, assuming data are sampled at 30 Hz)
- I would like to hear others' perspectives on this, but my personal preference would be to test 2 methods - 1) use a longer window, such as 5-10 minutes, and 2) use the gray screen periods at the beginning and end of the session to compute a baseline. Option 1 is easiest but may not be sufficient to solve the problem, as it still has the undesirable effect of differentially affecting cells with different activity profiles over time (median being an inaccurate estimate of baseline for cells / periods with high activity). Option 2 is more likely to provide a desirable outcome as it guarantees that we will be excluding the high activity periods during repeated visual stimulation, which are the source of the issue to begin with, but it requires a bit more work and may have its own caveats, such as failing to account for slow drift which is compensated for by using a rolling value for baseline.
- A set of plots that would help decide between these options - for each cell, plot 1) dF/F computed as it is now, 2) dF/F computed with a longer rolling median window, say 6 minutes, and 3) dF/F computed using the median of the first 5 minutes of the session as the baseline. If we can compare the overlay of these 3 traces across many cells, we can evaluate which sufficiently reduces the effect of pushing the dF/F into negative values. The distribution of negative values can also be quantified directly.

kschelonka commented 4 years ago

Thanks @matchings. Are the gray screen periods marked with any kind of stimulus stop/start events? I think it would be fine to make some plots using the first 5 minutes, but my concern would be that the experimental protocol could change in the future and this method would not be robust to that (unless it used events).

matchings commented 4 years ago

The end of the first 5 minute period coincides with the first stimulus start time in the stimulus_presentations attribute of the session object. The start of it is the start of the recording.

The start of the second 5 minute period coincides with the last stimulus end time in the stimulus_presentations attribute of the session object. The end of that 5 minute period happens when the natural movie starts, which currently is not included in the stimulus_presentations table, but really ought to be (there is probably a GitHub issue for that somewhere). A safe thing to do for now is to just take the first 4 mins after the last stimulus in the behavior task, then you know you're not impinging on the natural movie.

saskiad commented 4 years ago

I don't think using the gray period is a good idea. For a couple reasons

when we looked at different DF/F methods (and my recollection is you (@matchings) did the bulk of that work), fixed Fo were worse than ones that shifted across the experiment. I think we will end up seeing large drifts in DF/F using this method.
Different cells have different spontaneous activity rates (see extended data figure 1 of the platform paper). You could end up where (e.g.) VIP cells all have negative DF/F for the majority of the experiment. Which then poses problems for event detection.

saskiad commented 4 years ago

@kschelonka future experiments would definitely change in this regard. If using the first 5 minutes gets hardcoded in, it will have to get hardcoded out again in the future.

saskiad commented 4 years ago

@wbwakeman the link to the document doesn't work for me, can you send it via email?

kschelonka commented 4 years ago

@saskiad Try this (confluence link)

matchings commented 4 years ago

@saskiad I don't recall evaluating different dF/F methods...

I understand your concerns described above, but at this point, for this imminent and continually delayed data release, we need to make an empirical decision about what works well in practice for the Visual Behavior data and stimulus conditions. Things can change in the future, but we cannot hold up this data release in a search for the perfect solution.

I ask that we please just test a few different methods and identify the one that works the best with the data that is slated for release. If anyone has suggestions for additional methods to try, we can include those as well.

@kschelonka would it be possible for us to work together on this? If you can do an off-pipeline run of dF/F with the two versions described above (plus any others anyone suggests), and let me know where the output is, I can generate figures and evaluate the traces, then share those results here and ideally come to a decision soon.

saskiad commented 4 years ago

@matchings. Nothing I wrote suggested not making an empirical decision. I pointed to specific data to support why I don't think using the gray period is a good idea. Option 2 uses the highest activity for some cells (VIP) and the lowest activity for other cells (SST/Pyr). That will impact the results you derive from it regarding the activities of those different populations. The stimulus configuration is unique to this specific experiment, so the method will have to change for future data (I can see that doesn't concern you but it appears to concern technology per Kat's question above). And using a single Fo value from one time during the experiment means that the method is more susceptible to baseline drift across the experiment. You asked to hear others' perspectives. That is my perspective. I'm happy to look at traces for different methods when they're available.

matchings commented 4 years ago

Thanks @saskiad, I do value your opinion and perspective. I’m sorry if I was being dismissive, I’m having a combo of anxiety about the release and decision fatigue so my capacity for reason is limited. You are probably right about using a single F0 value being problematic, I’m just worried that even a 5+ minute rolling mode still won’t be long enough with our stimulus conditions (and high false alarm rate in behavior contributing to extended stimulus presentations) (and it can have cell type specific effects depending on how selective cells are for different images), so I want to try anything we can think of so that we have options. There’s likely no ideal solution given the various constraints so I’m really hoping that one of the options will at least improve the situation relative to the current status.

mabuice commented 4 years ago

I think it would be helpful if @mabuice also weighed in on this, given his familiarity with the event detection code and the dF/F algorithm used within it.

Here are the key questions listed in the linked document:
1. How is the r value for neuropil traces created and where is it stored for the visual behavior data?

* This is a question for @mabuice

Are you asking me to summarize the neuropil subtraction algorithm? I assume we are using the same thing introduced for the Visual Coding pipeline, is in the sdk, and is described in the platform paper. In short, it is a cross validated regression using a smoothness prior on the unmeasured cell trace. Quickly: fix an r, estimate the unmeasured trace using a smoothness prior with that r, assess the error using that estimated trace, the fixed r, and the measured values. Take the r that minimizes this error over folds. Do that for each ROI.

As to where this is stored, not being the person who manages the pipeline code, I have no idea. I would think Wayne is the person to ask.

As to dFF itself, I've advocated for switching over to what we use for event detection before.

kschelonka commented 4 years ago

@mabuice I wanted to make sure that I understood what the inputs and outputs to event detection were. Since they are results from the SDK then I can figure it out, but if they were coming from elsewhere I would have needed more info about how they were generated/where they were saved.

@matchings @saskiad It seems like the best path to move forward would be to run the data for a few different experimental protocols using the strategy options discussed. Do you have any preferences for choosing test data or shall I pick random experiments from visual coding and visual behavior?

It is possible on the technology side to use different methods for different experimental protocols.

saskiad commented 4 years ago

@kschelonka The Visual coding stimulus does not have the same stimulus structure, so I think you have to only use visual behavior experiments. @matchings is better suited to identify which ones.

saskiad commented 4 years ago

I believe neuropil r values are in a file called neuropil_correction.h5. I'm not certain, but that's my best guess.

kschelonka commented 4 years ago

@saskiad Is the current method working well for visual coding? If there's no desire for change there we can just investigate what works for visual behavior data, while keeping the option for the 3 minute median (currently used in the event detection code) for the updated dff algorithm.

saskiad commented 4 years ago

@kschelonka I have no complaints with the current method - we do the vast majority of the analysis using the events and I'm satisfied with it.

matchings commented 4 years ago

@kschelonka any experiments listed in the experiments table returned by the following code would be good to use for Visual Behavior (function returns all experiments passing QC). I will also email you the list of IDs.

import visual_behavior.data_access.loading as loading experiment_table = loading.get_filtered_ophys_experiment_table()

kschelonka commented 4 years ago

Thanks @saskiad and @matchings.

I propose that we make the algorithm used to compute the baseline a configurable parameter that is unique to the experimental protocol. The visual coding data will use the 3 minute median as before, and we will investigate the best choice for visual behavior. Do you all agree?

matchings commented 4 years ago

@kschelonka sounds good to me

kschelonka commented 4 years ago

It seems like there is a bug in the mesoscope data splitting. I wonder if that could be causing these weird spikes.

Edit: Confirmed there is a bug where timestamps are truncated improperly, but I don't know if it's related to this.

saskiad commented 3 years ago

what was the decision on this?

On Fri, Nov 6, 2020 at 9:43 AM Wayne Wakeman notifications@github.com wrote:

Closed #1669 https://github.com/AllenInstitute/AllenSDK/issues/1669.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/AllenInstitute/AllenSDK/issues/1669#event-3968270495, or unsubscribe https://github.com/notifications/unsubscribe-auth/AA5N4JTI4QNR5FMYFRT3UTLSOQYT3ANCNFSM4PC25OVQ .

kschelonka commented 3 years ago

@saskiad we are using the dff trace algorithm from event detection code. Visual behavior will use a 10 minute median filter for the baseline to start out with (may change later).

AllenInstitute / AllenSDK

DFF traces have extremely large and unusual (also negative) values! #1669