YosefLab / ImpulseDE2

37 stars 10 forks source link

Impulse parameters don't seem to fit? #12

Closed nicolerg closed 4 years ago

nicolerg commented 4 years ago

Hello,

I am confused by the interpretation of the impulse parameters. I am using the definitions of h0, h1, h2, t1, t2, and beta from Figure 1 of this paper by Chechik and Koller: h0 = initial amplitude h1 = peak amplitude h2 = steady-state amplitude t1 = onset time t2 = offset time beta = slope of first and second transitions

I ran runImpulseDE2 with a data set containing RNA-seq raw counts for 14,203 across 34 samples in 8 time points with the following command:

impulse_results = runImpulseDE2(matCountData = counts_round, 
                                dfAnnotation = annot,
                                boolCaseCtrl = FALSE, 
                                vecConfounders = NULL, 
                                scaNProc = 14,
                                scaQThres = 0.05, 
                                boolIdentifyTransients = TRUE,
                                boolVerbose = TRUE)

I pulled the impulse parameters from the impulse_results ImpulseDE2 object like this:

library(data.table)
fit = get_lsModelFits(obj=impulse_results)
params_list = list()
for (gene in names(fit$case)){
  fit_params = data.table(t(fit$case[[gene]]$lsImpulseFit$vecImpulseParam))
  fit_params[,ENSEMBL := gene]
  params_list[[gene]] = fit_params
}
all_params = rbindlist(params_list)

3,962 genes are differentially expressed if I apply cutoffs of padj < 0.01 & abs(log(h1/h0)) > 0.2 (where log(h1/h0) is a rough interpretation of logFC if h0 is the initial amplitude and h1 is the peak amplitude).

When I tried to filter down some of these DE genes by applying other thresholds based on the impulse parameters, I noticed something strange. 1,936 of the 3,962 genes are removed if I apply this filter: (beta > 0 & h0 < h1) | (beta < 0 & h0 > h1). Am I misinterpreting beta? How can h0 be greater than h1 if beta is positive or vice versa?

I then plotted a single one of the genes excluded by this filtering step that I also expect to be highly differentially expressed due to my perturbation.

Here's what it looks like with the package's built-in plotGenes() function:

gene = 'ENSRNOG00000018294'
lsgplotsGenes <- plotGenes(
  vecGeneIDs       = gene,
  objectImpulseDE2 = impulse_results,
  boolCaseCtrl     = FALSE)
g1 = lsgplotsGenes[[1]]
g1

Screen Shot 2020-02-11 at 10 00 25 AM

And here is what the impulse parameters are for that gene:

gene = 'ENSRNOG00000018294'
fit = get_lsModelFits(obj=impulse_results)
test = fit$case[[gene]]$lsImpulseFit$vecImpulseParam
test

Screen Shot 2020-02-11 at 10 02 17 AM Why is h1 so much less than h0?

This is what it looks like if you plot the the vecImpulseValue values from fit$case[[gene]]$lsImpulseFit$vecImpulseValue and annotate it with h0 (black), h1 (red), h2 (blue): Screen Shot 2020-02-11 at 10 04 49 AM

Am I misunderstanding the definitions of these parameters or how to access them from the ImpulseDE2 object? It would be really valuable to get the resolution achieved by these parameters.

nicolerg commented 4 years ago

It seems like this is an issue of multiple solutions. If I plug the parameters in to the functions defined in the Checknik and Koller paper, the fitted values follow the same curve as shown by plotGenes(). The parameters just aren't directly interpretable (there probably exists a solution in which they are if there were some constraints).

davidsebfischer commented 4 years ago

Hi @nicolerg, thanks for the issue and great to see that you are digging in so deep into this function class! So first thing to notice, that you also noticed, about your coefficient estimates is that t1 > t2. This means that h0, h1, h2 probably do not directly work as you would expect them based on the "initial plateau, final" concept of the impulse model. Secondly, in general, these values are not always actually reached depeding on slope on transition points, note the h* are limit values. Note that the gene that you selected still has a unimodal profile, so I would argue that it s not necessary to constraint t1 and t2 with respect to each other here. All evaluations of the functional profile (ie maxima etc) in ImpulseDE2 are performed based on fitted values, not on coefficients, so this non-intuitive behaviour of h* is not a problem. I would also recommend doing this if you need any functional characteristics other than the ones provided by ImpulseDE2. Two more considerations from a developer point of view:

  1. If those genes that you filtered now were filtered out, we would miss clear DE hits such as the one you presented.
  2. If we constrained t1 and t2 with respect to each other, we might harm convergence (it s a difficult optimisation problem) or may not even reach the maximum likelihood estimator which would harm the statistical tests.

Hope this helps!

nicolerg commented 4 years ago

Hi @davidsebfischer, thank you for the quick response! I can understand why you wouldn't want to add additional constraints from a developer's point of view.

We are more interested in getting interpretable impulse parameters than calling DE genes with this package, so it sounds like I would want to do something similar when you say "All evaluations of the functional profile (ie maxima etc) in ImpulseDE2 are performed based on fitted values, not on coefficients." Would you be able to point me to the function or chunk of code that infers these parameters? h0, h1, h2 are straightforward, but I am struggling to understand how beta is calculated since the values look much smaller than the slopes of the lines in these plots with units of adjusted counts. Thank you for your time!

davidsebfischer commented 4 years ago

What i meant with "functional profile" is that we always look at the fitted values: ie instead of looking at h1 you can for example look at the extremal value between the two endpoints of the time interval. Or, instead of looking at h2, you can look at the impulse model value (fitted value) at the highest time point fitted. This is more insightful for the actual trend that was fitted, even if t1<t2, as h* are only limit values of sigmoids that may never be reached by the fitted curve, depending on the parameter estimates. If you are interested in slopes, you can locally approximate the gradient with a finite difference, for example at the position at which the fitted values change the fastest. I hope this makes some sense, let me know if it doesnt!

nicolerg commented 4 years ago

Thanks, that matches my intuition for estimating h0, h1, h2 (in the loose sense that they match the definitions for these parameters that I'm interested in, not actually defining parameters for an independent impulse curve fit).

Can you please clarify what you mean by locally approximating the gradient? If I use the true impulse parameters to get time and expression values for a gene across my time course, and I find an estimate of time/x where the slope is maximal, how can I approximate the gradient in practice? Sorry if this is a naive question, gradients and meshes are completely out of my wheelhouse.

davidsebfischer commented 4 years ago

No worries! Let f(t) be the fitted impulse function.

  1. Evaluate f(t) on t=[tmin, step, tmax] to give y
  2. Find the switch points by looking for maximal and minimal delta y = y_{i+1} - y_i
  3. The approximate gradient at such a selected time point ti is `(y{i+1} - y_i )/ step`
nicolerg commented 4 years ago

That makes perfect sense - thank you very much! I really appreciate your responsiveness!