RfastOfficial / Rfast

A collection of Rfast functions for data analysis. Note 1: The vast majority of the functions accept matrices only, not data.frames. Note 2: Do not have matrices or vectors with have missing data (i.e NAs). We do no check about them and C++ internally transforms them into zeros (0), so you may get wrong results. Note 3: In general, make sure you give the correct input, in order to get the correct output. We do no checks and this is one of the many reasons we are fast.
143 stars 19 forks source link

Is there a fast way to fit many ys to a single x using score.negbinregs() in Rfast or negbin.regs() in Rfast2? #48

Closed xiekunwhy closed 1 year ago

xiekunwhy commented 3 years ago

Hi,

Is there a fast way to fit many ys to a single x using score.negbinregs() in Rfast or negbin.regs() in Rfast2?

Best, Kun

statlink commented 3 years ago

Hi xiekunwhy.

The only reason we did not do it is because we thought no one would use it or ask for it. We will make it happen in Rfast2, but for the "score" regressions as it is much easier. For the regular regressions this will take some more time as we have prioritised some other functions.

Michail

xiekunwhy commented 3 years ago

Hi,

Glad to hear that and hope it coming soon. For my use case, I need to fit many genes expression data (ys) to a single (or some) other data like cell state and/or pseudo-development time (x(s)), and monocle package can do this by applying MASS::glm.nb(), but it is too slow.

Best, Kun

xiekunwhy commented 3 years ago

Hi Michail

Sorry to trouble you again. But I need to ask for some error dealing features because not all ys are normally, and it is also very difficult to filter out those abnormal ys, would please add an option to return NAs or NULLs instead of stopping the functions?

sigTest <- as.data.frame(t(apply(yexprs, 2, function(y) negbin.regs(y, xvalue, logged = FALSE)))) Error in while (abs(r1 - r2) > tol & r2 < 15) { : missing value where TRUE/FALSE needed

Best, Kun

statlink commented 3 years ago

Hi Kun,

It is not easy to include negbin.regs as the guy who is in charge of this is working full time. In fact, right now all collaborators are working full time in the industry, except for me who I am at the university. That is why I said we have some other priorities. The score.negbinregs and the other score related functions is my responsibility so I will do it.

As for the error, I am afraid we cannot do something there. We provide speed but do very few checks. The user is suppose to give us clean data. If we performed checks that would slow down our procedures.

At the moment I would suggest to use negbin.reg inside a loop and not negbin.regs.

Michail

xiekunwhy commented 3 years ago

Hi Michail,

Things worked well (and fast enough) when I use following script (used round(y) instead of y): sigTest <- as.data.frame(t(apply(ymatrix, 2, function(y) tryCatch({score.negbinregs(round(y), xvalue, logged = FALSE)},error = function(x){return(c(NA, NA))}))))

sigTest2 <- as.data.frame(t(apply(ymatrix, 2, function(y) tryCatch({negbin.regs(round(y), xvalue, tol = 1e-03, logged = FALSE)}, error = function(x){return(c(NA, NA, NA, NA, NA, NA, NA, NA))}))))

and I found that P value of negbin.regs is much smaller than score.negbinregs, do you know why?

An other question is that how to get P values when I use negbin.reg inside a loop instead of negbin.regs?

Best, Kun

statlink commented 3 years ago

It is natural not to be the same as the one is log-likelihood ratio test, whereas the other is score test. See the reference: http://siba-ese.unisalento.it/index.php/ejasa/article/view/21017

You will need to perform a log-likelihood ratio test for that. So, skip what I said and use negbin.regs it's easier.

xiekunwhy commented 3 years ago

Hi Michail,

Thank you for your reply. Two more questions here, 1) I got many "warning: solve(): system seems singular; attempting approx solution" in my loop, will these warning affect the results and should I need to do some thing to avoid them? 2) There is no detail documents to explain the result matrix, it just said that "A matrix with the test statistic values and their relevant (logged) p-values." when I ?negbin.regs(), would you please provide some details? For example, library(MASS) library(Rfast2) y <- rnbinom(100, 10, 0.7) x <- matrix( rnorm(100 * 3), ncol = 3 ) cdata <- as.data.frame(cbind(x,y))

quine.nb1 <- glm.nb(y ~ V2, data = cdata)

summary(quine.nb1)

anova(quine.nb1)

negbin.regs(y, x, tol = 1e-03, logged = FALSE) [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [1,] 442.8746 1330.003 1.1259118 460.7739 2.563414e-98 3.413357e-291 0.2886490 3.261973e-102 [2,] 442.8746 1331.111 0.5722107 460.3012 2.563414e-98 1.961245e-291 0.4493818 4.133802e-102 [3,] 442.8746 1331.942 0.1564839 459.6961 2.563414e-98 1.293745e-291 0.6924146 5.597784e-102

I only know that the 7th column is the p value column (by comparing glm.nb and negbin.regs results), would you please explain the rest columns?

Best, Kun

statlink commented 3 years ago

I need to see your data to evaluate your situation.

The fact that there are 7 or 8 columns is weired. There should be two only. I need to see this issue.

statlink commented 3 years ago

Regarding the 2nd question I had made a mistake. There should be 2 columns only. This will be fixed in the next update. Send me an email with your information to add your name in the acknowledgements.

Michail