Closed farznaj closed 3 years ago
There are actually 2 issues with dF/F traces having negative values. One is as Farzaneh describes above, with extreme outliers being present. A second issue is associated with the way dF/F is computed, using a rolling 3 minute mode to compute the baseline. Using the mode over a 3 minute window assumes that neural activity is sparse, and that the mode will reflect the true baseline (where there is no activity). Given that the Visual Behavior stimulus involves repeated presentations of the same image, if a cell is highly responsive to that image, it will be highly active for a long period of time (potentially minutes), resulting in a mode that is not reflective of true baseline activity in the absence of stimulus. As a result, we need to reconsider this method of baseline calculation. One option is to make the rolling window sufficiently large that it is not possible for the same stimulus to be shown for that long. Another option is to use the 10 minutes of spontaneous activity during the gray screen periods of the experiment to compute a baseline. There may be other options as well.
An additional consideration with dF/F, that I believe also relates to negative values, comes up with event detection - to deal with some artifacts of the current dF/F calculation, the event detection code recalculates dF/F prior to running the event detection algorithm on the traces. Whatever changes were implemented in the event detection version of dF/F should be incorporated into the main dF/F computation, so that there is only one version of dF/F traces.
Apologies if this compounds multiple issues here, I just wanted to lay out all the dF/F related considerations so they are documented.
@matchings I wrote a document detailing the dF/F algorithm implemented in the event detection code, in descriptive language and pseudocode. There are some open questions regarding the application to Visual Behavior. Please take a look and let me know if you have any comments/questions: doc (internal link)
@matchings @jeromelecoq @dougollerenshaw @saskiad @farznaj In preparation for the implementation of event detection into the 2p processing pipeline, @kschelonka has prepared a document describing how DF/F will be computed so that large negative events as described in this issue will no longer occur. Could you please review the document and provide feedback as you see fit?
I think it would be helpful if @mabuice also weighed in on this, given his familiarity with the event detection code and the dF/F algorithm used within it.
Here are the key questions listed in the linked document:
How is the r value for neuropil traces created and where is it stored for the visual behavior data?
What is a long enough median filter to accurately compute the baseline? (Current default is 3 minutes, assuming data are sampled at 30 Hz)
Thanks @matchings. Are the gray screen periods marked with any kind of stimulus stop/start events? I think it would be fine to make some plots using the first 5 minutes, but my concern would be that the experimental protocol could change in the future and this method would not be robust to that (unless it used events).
The end of the first 5 minute period coincides with the first stimulus start time in the stimulus_presentations
attribute of the session object. The start of it is the start of the recording.
The start of the second 5 minute period coincides with the last stimulus end time in the stimulus_presentations
attribute of the session object. The end of that 5 minute period happens when the natural movie starts, which currently is not included in the stimulus_presentations
table, but really ought to be (there is probably a GitHub issue for that somewhere). A safe thing to do for now is to just take the first 4 mins after the last stimulus in the behavior task, then you know you're not impinging on the natural movie.
I don't think using the gray period is a good idea. For a couple reasons
when we looked at different DF/F methods (and my recollection is you (@matchings) did the bulk of that work), fixed Fo were worse than ones that shifted across the experiment. I think we will end up seeing large drifts in DF/F using this method.
Different cells have different spontaneous activity rates (see extended data figure 1 of the platform paper). You could end up where (e.g.) VIP cells all have negative DF/F for the majority of the experiment. Which then poses problems for event detection.
@kschelonka future experiments would definitely change in this regard. If using the first 5 minutes gets hardcoded in, it will have to get hardcoded out again in the future.
@wbwakeman the link to the document doesn't work for me, can you send it via email?
@saskiad Try this (confluence link)
@saskiad I don't recall evaluating different dF/F methods...
I understand your concerns described above, but at this point, for this imminent and continually delayed data release, we need to make an empirical decision about what works well in practice for the Visual Behavior data and stimulus conditions. Things can change in the future, but we cannot hold up this data release in a search for the perfect solution.
I ask that we please just test a few different methods and identify the one that works the best with the data that is slated for release. If anyone has suggestions for additional methods to try, we can include those as well.
@kschelonka would it be possible for us to work together on this? If you can do an off-pipeline run of dF/F with the two versions described above (plus any others anyone suggests), and let me know where the output is, I can generate figures and evaluate the traces, then share those results here and ideally come to a decision soon.
@matchings. Nothing I wrote suggested not making an empirical decision. I pointed to specific data to support why I don't think using the gray period is a good idea. Option 2 uses the highest activity for some cells (VIP) and the lowest activity for other cells (SST/Pyr). That will impact the results you derive from it regarding the activities of those different populations. The stimulus configuration is unique to this specific experiment, so the method will have to change for future data (I can see that doesn't concern you but it appears to concern technology per Kat's question above). And using a single Fo value from one time during the experiment means that the method is more susceptible to baseline drift across the experiment. You asked to hear others' perspectives. That is my perspective. I'm happy to look at traces for different methods when they're available.
Thanks @saskiad, I do value your opinion and perspective. I’m sorry if I was being dismissive, I’m having a combo of anxiety about the release and decision fatigue so my capacity for reason is limited. You are probably right about using a single F0 value being problematic, I’m just worried that even a 5+ minute rolling mode still won’t be long enough with our stimulus conditions (and high false alarm rate in behavior contributing to extended stimulus presentations) (and it can have cell type specific effects depending on how selective cells are for different images), so I want to try anything we can think of so that we have options. There’s likely no ideal solution given the various constraints so I’m really hoping that one of the options will at least improve the situation relative to the current status.
I think it would be helpful if @mabuice also weighed in on this, given his familiarity with the event detection code and the dF/F algorithm used within it.
Here are the key questions listed in the linked document:
1. How is the r value for neuropil traces created and where is it stored for the visual behavior data? * This is a question for @mabuice
Are you asking me to summarize the neuropil subtraction algorithm? I assume we are using the same thing introduced for the Visual Coding pipeline, is in the sdk, and is described in the platform paper. In short, it is a cross validated regression using a smoothness prior on the unmeasured cell trace. Quickly: fix an r, estimate the unmeasured trace using a smoothness prior with that r, assess the error using that estimated trace, the fixed r, and the measured values. Take the r that minimizes this error over folds. Do that for each ROI.
As to where this is stored, not being the person who manages the pipeline code, I have no idea. I would think Wayne is the person to ask.
As to dFF itself, I've advocated for switching over to what we use for event detection before.
@mabuice I wanted to make sure that I understood what the inputs and outputs to event detection were. Since they are results from the SDK then I can figure it out, but if they were coming from elsewhere I would have needed more info about how they were generated/where they were saved.
@matchings @saskiad It seems like the best path to move forward would be to run the data for a few different experimental protocols using the strategy options discussed. Do you have any preferences for choosing test data or shall I pick random experiments from visual coding and visual behavior?
It is possible on the technology side to use different methods for different experimental protocols.
@kschelonka The Visual coding stimulus does not have the same stimulus structure, so I think you have to only use visual behavior experiments. @matchings is better suited to identify which ones.
I believe neuropil r values are in a file called neuropil_correction.h5. I'm not certain, but that's my best guess.
@saskiad Is the current method working well for visual coding? If there's no desire for change there we can just investigate what works for visual behavior data, while keeping the option for the 3 minute median (currently used in the event detection code) for the updated dff algorithm.
@kschelonka I have no complaints with the current method - we do the vast majority of the analysis using the events and I'm satisfied with it.
@kschelonka any experiments listed in the experiments table returned by the following code would be good to use for Visual Behavior (function returns all experiments passing QC). I will also email you the list of IDs.
import visual_behavior.data_access.loading as loading
experiment_table = loading.get_filtered_ophys_experiment_table()
Thanks @saskiad and @matchings.
I propose that we make the algorithm used to compute the baseline a configurable parameter that is unique to the experimental protocol. The visual coding data will use the 3 minute median as before, and we will investigate the best choice for visual behavior. Do you all agree?
@kschelonka sounds good to me
It seems like there is a bug in the mesoscope data splitting. I wonder if that could be causing these weird spikes.
Edit: Confirmed there is a bug where timestamps are truncated improperly, but I don't know if it's related to this.
what was the decision on this?
On Fri, Nov 6, 2020 at 9:43 AM Wayne Wakeman notifications@github.com wrote:
Closed #1669 https://github.com/AllenInstitute/AllenSDK/issues/1669.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/AllenInstitute/AllenSDK/issues/1669#event-3968270495, or unsubscribe https://github.com/notifications/unsubscribe-auth/AA5N4JTI4QNR5FMYFRT3UTLSOQYT3ANCNFSM4PC25OVQ .
@saskiad we are using the dff trace algorithm from event detection code. Visual behavior will use a 10 minute median filter for the baseline to start out with (may change later).
Describe the bug DFF traces of some neurons have extremely large and unusual (also negative) values! I found this for mesoscope experiments.
All the sessions below include neurons with such a problem:
To Reproduce
import visual_behavior.data_access.loading as loading import numpy as np import matplotlib.pyplot as plt
Another example
Expected behavior dff values should not be so extreme!
Actual Behavior See the attached figures.
Environment (please complete the following information):
Additional context NA
Do you want to work on this issue? I don't have any available time.