Closed jan-glx closed 3 years ago
Oh no...
Yeah, that may have been too optimistic :/ I'll try and have a look this afternoon, what's going on and will at least make sure that a failure to estimate beta for one gene won't make the whole function crash.
Thanks for digging into this and providing the example data :)
Awesome! My wild guess is that the initial guess is so good that there is no way to improve it.
Hm, I cannot reproduce the behavior that you report :/
> do.call(estimate_betas_fisher_scoring, pars)
$Beta
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11] [,12] [,13] [,14] [,15] [,16] [,17]
[1,] 16.63781 -0.1256251 0.15644996 -0.07549910 0.1646437 0.1912630 0.03485183 0.03229181 0.2402343 0.3362655 -0.05429001 -0.1214602 0.04572174 0.08695161 -0.04867007 -0.08189775 -0.338070929
[2,] 17.45760 -0.1710280 0.01900898 -0.06383213 0.1749302 0.1395779 0.19676270 0.12752162 0.1337939 0.1531356 -0.10155641 -0.1321989 -0.11918980 -0.25121820 0.05298848 0.07026185 -0.008302768
[,18] [,19] [,20] [,21] [,22] [,23] [,24] [,25] [,26] [,27] [,28] [,29] [,30] [,31] [,32]
[1,] -0.2842984 -0.09034367 -0.33117078 0.1646174219 0.14106053 0.07767562 -0.18404246 -0.17512905 -0.33013413 -0.2993631 0.1368590 0.18339637 -0.001078211 -0.12065117 -0.17758951
[2,] 0.0111428 0.05554734 -0.09019337 -0.0002925837 -0.07616395 -0.06578813 0.04880419 0.03457046 -0.05224566 -0.0364010 -0.1171842 -0.03201836 -0.120706440 0.02161945 0.01775851
[,33] [,34] [,35] [,36] [,37] [,38] [,39] [,40] [,41] [,42] [,43] [,44] [,45] [,46] [,47] [,48]
[1,] -0.29146643 -0.21737416 -0.13576574 -0.07481921 0.03626795 -0.02230796 -0.05025337 0.04157687 -0.2338234049 -0.02899101 -0.04472247 0.1822620 -0.35278634 0.1290240 -0.01209915 0.2357902
[2,] -0.07422066 -0.03375835 -0.04179356 -0.09057485 -0.14327812 -0.08505122 0.03673232 0.15294517 0.0002153319 0.02273975 0.01281928 0.1364525 -0.09209897 0.1266514 -0.01281459 0.1108201
[,49] [,50] [,51] [,52] [,53] [,54] [,55] [,56] [,57] [,58] [,59] [,60] [,61] [,62] [,63] [,64]
[1,] 0.1669566 0.2286588 -0.07738464 -0.01242978 0.09689969 -0.55106741 0.3017041 0.3198646 -0.04609768 -0.01570810 -0.003161268 -0.32000025 -0.1707829 0.358217171 0.53448892 -0.1207368
[2,] -0.1308798 0.2034883 0.07806760 0.01063464 0.20238523 -0.09685704 0.1566239 0.1183644 -0.05575549 -0.05457412 0.183428804 -0.08531331 0.0842248 0.001680852 -0.02319043 -0.2027130
[,65] [,66] [,67] [,68] [,69] [,70] [,71] [,72] [,73] [,74] [,75] [,76] [,77] [,78]
[1,] -0.05683072 0.1049338 0.04020218 -0.20735729 0.001379412 0.1023875 0.1685775519 0.11131393 0.16620555 0.2080259 0.2338429 -0.1123109 0.02781702 -0.0009693966
[2,] -0.02291194 0.1876742 0.08056478 -0.06049124 0.081296367 0.0828567 0.0008504246 0.06377887 0.05247491 0.3138343 0.3200137 0.1674442 -0.02250256 0.0378231420
$iterations
[1] 3 6
$deviances
[1] 1.2092362 0.9311362
My guess is that it is due to some architecture differences (you're on Windows, right?).
Well, that is a bit unfortunate. But I will just assume that sometimes fitBeta_fisher_scoring()
returns NA's (although it shouldn't) and then will try to at least make sure that it won't crash the subsequent functions.
Oh no, me neither ( this was on a centos server). Perhaps the dput lost some precision, will try to get a real reproducible in a few minutes.
OK, I just sent you .rds files to your gmail address and checked them on another machine.
Thanks for the files, I am able to reproduce the problem.
https://github.com/const-ae/glmGamPoi/commit/fa249d918a775ab94363afa14be9602362849f5b makes the package robust against internally produced NA
's. I suspect there will still be some code paths that might crash, but the commit covers (I hope) the most important ones.
https://github.com/const-ae/glmGamPoi/commit/e1623e74f95e9ae8cdeb5f2b9a51c9ad7f82e993 adds a fallback mechanism with optim
. If fitBeta_fisher_scoring
does not converge for whatever reason, I try again with optim which hopefully can produce a useful estimate, even though it is much slower in general.
Right now, the code kind of works, but needs a lot more tests which I will try to implement tomorrow.
Thanks for implementing a workaround so quickly! With current master I no longer get the error (and also no warning). Since you write that you changed the point of reference for the deviance, I am wondering if that might impact the convergence criterion. Would an absolute change be more reasonable here?
Since you write that you changed the point of reference for the deviance
I don't understand what you mean by this?
Great that it already works for you. I found one minor bug in relation to the ridge penalty in estimate_betas_optim
(https://github.com/const-ae/glmGamPoi/commit/8ed7ba35d654c62b68f78a79b74523902d78c5ff), so it might be best for you to re-install the package.
I was referring to https://github.com/const-ae/glmGamPoi/blob/6c5c93118f21ca9f663d233ab96404b27dfd5f59/src/beta_estimation.cpp#L179 Such a change will affect the absolute value of the deviance but not its absolute change in any step, therefore the employed relative convergence criterion https://github.com/const-ae/glmGamPoi/blob/a89a72544c6b36ffb44235436618b21e18fb740d/src/beta_estimation.cpp#L114 will probably be reached at another point. But might very well be that I am miss-understanding something here or that the NaNs have nothing to do with that (but rather with my large count values, for example)
Ah, I see. Yes, it's quite possible that glmGamPoi
converges differently from DESeq2
. I assume that actually, the problem is in decrease_deviance
:
https://github.com/const-ae/glmGamPoi/blob/6c5c93118f21ca9f663d233ab96404b27dfd5f59/src/beta_estimation.cpp#L118-L122
The idea is to employ a line search to only take steps that actually decrease the deviance. This makes the whole algorithm a lot more robust. However, under those very rare circumstances, it sometimes happens that even going 0.5^100 = 7*10^-31 times the original step in the wrong direction still doesn't decrease the deviance. I have no idea how it comes that the first step is so bad or if it is really just bad luck... I think your original example demonstrated that even a tiny change of precision from dput
is enough to make the algorithm converge.
Full error:
The issue seems to root in
estimate_beta_fisher_scoring
, which leads toNA
estimates in some cases like this one: gist with data (2 genes)Perhaps this hope is vain? https://github.com/const-ae/glmGamPoi/blob/6c5c93118f21ca9f663d233ab96404b27dfd5f59/src/beta_estimation.cpp#L269