const-ae / glmGamPoi

Fit Gamma-Poisson Generalized Linear Models Reliably
105 stars 14 forks source link

Problem in estimate_betas_fisher_scoring leads to "Error in estimate_overdispersions_fast" #25

Closed jan-glx closed 3 years ago

jan-glx commented 3 years ago

Full error:

Error in estimate_overdispersions_fast(y, mean, model_matrix, do_cox_reid_adjustment,  : 
  Evaluation error: all(!is.na(mean)) is not TRUE.

The issue seems to root in estimate_beta_fisher_scoring, which leads to NA estimates in some cases like this one: gist with data (2 genes)

do.call(estimate_betas_fisher_scoring, pars)
$Beta
         [,1]       [,2]    [,3]       [,4]      [,5]     [,6]       [,7]       [,8]      [,9]     [,10]       [,11]      [,12]      [,13]      [,14]       [,15]       [,16]      [,17]      [,18]       [,19]      [,20]     [,21]     [,22]      [,23]      [,24]     [,25]
[1,] 16.63781 -0.1256251 0.15645 -0.0754991 0.1646437 0.191263 0.03485183 0.03229181 0.2402343 0.3362655 -0.05429001 -0.1214602 0.04572174 0.08695161 -0.04867007 -0.08189775 -0.3380709 -0.2842984 -0.09034367 -0.3311708 0.1646174 0.1410605 0.07767562 -0.1840425 -0.175129
[2,]       NA         NA      NA         NA        NA       NA         NA         NA        NA        NA          NA         NA         NA         NA          NA          NA         NA         NA          NA         NA        NA        NA         NA         NA        NA
          [,26]      [,27]    [,28]     [,29]        [,30]      [,31]      [,32]      [,33]      [,34]      [,35]       [,36]      [,37]       [,38]       [,39]      [,40]      [,41]       [,42]       [,43]    [,44]      [,45]    [,46]       [,47]     [,48]     [,49]     [,50]
[1,] -0.3301341 -0.2993631 0.136859 0.1833964 -0.001078211 -0.1206512 -0.1775895 -0.2914664 -0.2173742 -0.1357657 -0.07481921 0.03626795 -0.02230796 -0.05025337 0.04157687 -0.2338234 -0.02899101 -0.04472247 0.182262 -0.3527863 0.129024 -0.01209915 0.2357902 0.1669566 0.2286588
[2,]         NA         NA       NA        NA           NA         NA         NA         NA         NA         NA          NA         NA          NA          NA         NA         NA          NA          NA       NA         NA       NA          NA        NA        NA        NA
           [,51]       [,52]      [,53]      [,54]     [,55]     [,56]       [,57]      [,58]        [,59]      [,60]      [,61]     [,62]     [,63]      [,64]       [,65]     [,66]      [,67]      [,68]       [,69]     [,70]     [,71]     [,72]     [,73]     [,74]     [,75]
[1,] -0.07738464 -0.01242978 0.09689969 -0.5510674 0.3017041 0.3198646 -0.04609768 -0.0157081 -0.003161268 -0.3200002 -0.1707829 0.3582172 0.5344889 -0.1207368 -0.05683072 0.1049338 0.04020218 -0.2073573 0.001379412 0.1023875 0.1685776 0.1113139 0.1662056 0.2080259 0.2338429
[2,]          NA          NA         NA         NA        NA        NA          NA         NA           NA         NA         NA        NA        NA         NA          NA        NA         NA         NA          NA        NA        NA        NA        NA        NA        NA
          [,76]      [,77]         [,78]
[1,] -0.1123109 0.02781702 -0.0009693966
[2,]         NA         NA            NA

$iterations
[1]    3 1000

$deviances
[1] 1.209236      NaN

Perhaps this hope is vain? https://github.com/const-ae/glmGamPoi/blob/6c5c93118f21ca9f663d233ab96404b27dfd5f59/src/beta_estimation.cpp#L269

const-ae commented 3 years ago

Oh no...

Yeah, that may have been too optimistic :/ I'll try and have a look this afternoon, what's going on and will at least make sure that a failure to estimate beta for one gene won't make the whole function crash.

Thanks for digging into this and providing the example data :)

jan-glx commented 3 years ago

Awesome! My wild guess is that the initial guess is so good that there is no way to improve it.

const-ae commented 3 years ago

Hm, I cannot reproduce the behavior that you report :/

> do.call(estimate_betas_fisher_scoring, pars)
$Beta
         [,1]       [,2]       [,3]        [,4]      [,5]      [,6]       [,7]       [,8]      [,9]     [,10]       [,11]      [,12]       [,13]       [,14]       [,15]       [,16]        [,17]
[1,] 16.63781 -0.1256251 0.15644996 -0.07549910 0.1646437 0.1912630 0.03485183 0.03229181 0.2402343 0.3362655 -0.05429001 -0.1214602  0.04572174  0.08695161 -0.04867007 -0.08189775 -0.338070929
[2,] 17.45760 -0.1710280 0.01900898 -0.06383213 0.1749302 0.1395779 0.19676270 0.12752162 0.1337939 0.1531356 -0.10155641 -0.1321989 -0.11918980 -0.25121820  0.05298848  0.07026185 -0.008302768
          [,18]       [,19]       [,20]         [,21]       [,22]       [,23]       [,24]       [,25]       [,26]      [,27]      [,28]       [,29]        [,30]       [,31]       [,32]
[1,] -0.2842984 -0.09034367 -0.33117078  0.1646174219  0.14106053  0.07767562 -0.18404246 -0.17512905 -0.33013413 -0.2993631  0.1368590  0.18339637 -0.001078211 -0.12065117 -0.17758951
[2,]  0.0111428  0.05554734 -0.09019337 -0.0002925837 -0.07616395 -0.06578813  0.04880419  0.03457046 -0.05224566 -0.0364010 -0.1171842 -0.03201836 -0.120706440  0.02161945  0.01775851
           [,33]       [,34]       [,35]       [,36]       [,37]       [,38]       [,39]      [,40]         [,41]       [,42]       [,43]     [,44]       [,45]     [,46]       [,47]     [,48]
[1,] -0.29146643 -0.21737416 -0.13576574 -0.07481921  0.03626795 -0.02230796 -0.05025337 0.04157687 -0.2338234049 -0.02899101 -0.04472247 0.1822620 -0.35278634 0.1290240 -0.01209915 0.2357902
[2,] -0.07422066 -0.03375835 -0.04179356 -0.09057485 -0.14327812 -0.08505122  0.03673232 0.15294517  0.0002153319  0.02273975  0.01281928 0.1364525 -0.09209897 0.1266514 -0.01281459 0.1108201
          [,49]     [,50]       [,51]       [,52]      [,53]       [,54]     [,55]     [,56]       [,57]       [,58]        [,59]       [,60]      [,61]       [,62]       [,63]      [,64]
[1,]  0.1669566 0.2286588 -0.07738464 -0.01242978 0.09689969 -0.55106741 0.3017041 0.3198646 -0.04609768 -0.01570810 -0.003161268 -0.32000025 -0.1707829 0.358217171  0.53448892 -0.1207368
[2,] -0.1308798 0.2034883  0.07806760  0.01063464 0.20238523 -0.09685704 0.1566239 0.1183644 -0.05575549 -0.05457412  0.183428804 -0.08531331  0.0842248 0.001680852 -0.02319043 -0.2027130
           [,65]     [,66]      [,67]       [,68]       [,69]     [,70]        [,71]      [,72]      [,73]     [,74]     [,75]      [,76]       [,77]         [,78]
[1,] -0.05683072 0.1049338 0.04020218 -0.20735729 0.001379412 0.1023875 0.1685775519 0.11131393 0.16620555 0.2080259 0.2338429 -0.1123109  0.02781702 -0.0009693966
[2,] -0.02291194 0.1876742 0.08056478 -0.06049124 0.081296367 0.0828567 0.0008504246 0.06377887 0.05247491 0.3138343 0.3200137  0.1674442 -0.02250256  0.0378231420

$iterations
[1] 3 6

$deviances
[1] 1.2092362 0.9311362

My guess is that it is due to some architecture differences (you're on Windows, right?).

Well, that is a bit unfortunate. But I will just assume that sometimes fitBeta_fisher_scoring() returns NA's (although it shouldn't) and then will try to at least make sure that it won't crash the subsequent functions.

jan-glx commented 3 years ago

Oh no, me neither ( this was on a centos server). Perhaps the dput lost some precision, will try to get a real reproducible in a few minutes.

jan-glx commented 3 years ago

OK, I just sent you .rds files to your gmail address and checked them on another machine.

const-ae commented 3 years ago

Thanks for the files, I am able to reproduce the problem.

https://github.com/const-ae/glmGamPoi/commit/fa249d918a775ab94363afa14be9602362849f5b makes the package robust against internally produced NA's. I suspect there will still be some code paths that might crash, but the commit covers (I hope) the most important ones.

https://github.com/const-ae/glmGamPoi/commit/e1623e74f95e9ae8cdeb5f2b9a51c9ad7f82e993 adds a fallback mechanism with optim. If fitBeta_fisher_scoring does not converge for whatever reason, I try again with optim which hopefully can produce a useful estimate, even though it is much slower in general.

Right now, the code kind of works, but needs a lot more tests which I will try to implement tomorrow.

jan-glx commented 3 years ago

Thanks for implementing a workaround so quickly! With current master I no longer get the error (and also no warning). Since you write that you changed the point of reference for the deviance, I am wondering if that might impact the convergence criterion. Would an absolute change be more reasonable here?

const-ae commented 3 years ago

Since you write that you changed the point of reference for the deviance

I don't understand what you mean by this?


Great that it already works for you. I found one minor bug in relation to the ridge penalty in estimate_betas_optim (https://github.com/const-ae/glmGamPoi/commit/8ed7ba35d654c62b68f78a79b74523902d78c5ff), so it might be best for you to re-install the package.

jan-glx commented 3 years ago

I was referring to https://github.com/const-ae/glmGamPoi/blob/6c5c93118f21ca9f663d233ab96404b27dfd5f59/src/beta_estimation.cpp#L179 Such a change will affect the absolute value of the deviance but not its absolute change in any step, therefore the employed relative convergence criterion https://github.com/const-ae/glmGamPoi/blob/a89a72544c6b36ffb44235436618b21e18fb740d/src/beta_estimation.cpp#L114 will probably be reached at another point. But might very well be that I am miss-understanding something here or that the NaNs have nothing to do with that (but rather with my large count values, for example)

const-ae commented 3 years ago

Ah, I see. Yes, it's quite possible that glmGamPoi converges differently from DESeq2. I assume that actually, the problem is in decrease_deviance: https://github.com/const-ae/glmGamPoi/blob/6c5c93118f21ca9f663d233ab96404b27dfd5f59/src/beta_estimation.cpp#L118-L122

The idea is to employ a line search to only take steps that actually decrease the deviance. This makes the whole algorithm a lot more robust. However, under those very rare circumstances, it sometimes happens that even going 0.5^100 = 7*10^-31 times the original step in the wrong direction still doesn't decrease the deviance. I have no idea how it comes that the first step is so bad or if it is really just bad luck... I think your original example demonstrated that even a tiny change of precision from dput is enough to make the algorithm converge.