GoekeLab / proActiv

Estimation of Promoter Activity from RNA-Seq data
https://goekelab.github.io/proActiv/
Other
45 stars 14 forks source link

Some question about getAlternativePromoters() #41

Closed ninjaxfy closed 2 years ago

ninjaxfy commented 2 years ago

Hello: I have been employed proActiv in my reasearch and it is an excellent work!!! And your reply will do me a lot favor!!! However, when I read the source R code of the proActiv, I did come across some problemes in the [identify-alternative-promoters.R] file, line 113 in the fitPromoters function, the code pval[nonInternalId] <- unlist(lapply(seq_len(num.pros),function(i)tryCatch(summary(lm(unlist(assay[i,]) ~ condition))$coef[2,4], warning = function(cond) 1, error = function(cond) NaN))) , the coef[2,4] will not get the corresponding p-value of the liner model of the specified condition and absolute or relative activity.

For instance in my dataset, the upregulate result obtained by the function getAlternativePromoters()

test <- proActiv::getAlternativePromoters(result = rh_result,referenceCondition = "Ovary")
test$upReg
      promoterId             geneId      padjAbs      padjRel
2117        2117 ENSMMUG00000002145 7.124155e-03 3.173488e-02
3804        3804 ENSMMUG00000003854 2.052427e-10 4.620107e-03
4155        4155 ENSMMUG00000004228 1.362310e-02 2.714193e-02
4351        4351 ENSMMUG00000004419 1.046429e-07 9.349281e-03
4754        4754 ENSMMUG00000004808 2.423727e-08 1.642590e-02
5010        5010 ENSMMUG00000005076 4.361831e-07 8.165571e-04
5610        5610 ENSMMUG00000005733 1.939050e-10 3.440076e-11

And I use the fitPromoters() function in the data, selecting some data corresponding to the above result:

fit_pro <- fitPromoters(result = rh_result,referenceCondition = 'Ovary',type = 'absolute')
fit_pro[c("2117","3804","4155","4351"),]
     promoterId             geneId         pval         padj  abs.cond  abs.other gexp.cond gexp.other
2117       2117 ENSMMUG00000002145 2.279165e-03 7.124155e-03 1.3217218 0.59176266  6.973109   6.057004
3804       3804 ENSMMUG00000003854 2.193793e-11 2.052427e-10 1.7135613 0.80180591  4.227965   4.240371
4155       4155 ENSMMUG00000004228 4.634800e-03 1.362310e-02 0.2915928 0.08862209  3.377801   2.547155
4351       4351 ENSMMUG00000004419 1.543632e-08 1.046429e-07 3.0058058 1.22279272 14.598658  12.591623

ThefitPromoter() function padj result is the same as the getAlternativePromoters()

However, when I use the liner model to check the result: The p-value is not corresponding to the reference condition's p-value, e.g.

summary(lm(unlist(assay["2117",]) ~ condition))
Call:
lm(formula = unlist(assay["2117", ]) ~ condition)
Residuals:
    Min      1Q  Median      3Q     Max 
-1.6072 -0.4469 -0.2827  0.4137  2.6163 
Coefficients:
                                 Estimate Std. Error t value Pr(>|t|)    
(Intercept)                     1.323e-15  2.054e-01   0.000  1.00000    
conditionBrain                  1.097e+00  3.557e-01   3.084  0.00228 ** 
conditionCerebellar.Cortex      3.564e-01  3.557e-01   1.002  0.31734    
conditionForebrain              4.617e-01  2.408e-01   1.917  0.05639 .  
conditionHeart                  3.820e-01  2.430e-01   1.572  0.11727    
conditionHindbrain              2.938e-01  2.717e-01   1.081  0.28061    
conditionKidney                 5.638e-01  2.348e-01   2.401  0.01711 *  
conditionLiver                  2.827e-01  2.287e-01   1.236  0.21760    
conditionMuscle                 6.021e-01  2.904e-01   2.073  0.03922 *  
conditionOvary                  1.322e+00  4.592e-01   2.878  0.00435 ** 
conditionPrefrontal.Cortex     -1.966e-15  3.557e-01   0.000  1.00000    
conditionPrimary.Visual.Cortex  7.712e-01  3.557e-01   2.168  0.03111 *  
conditionTestis                 1.607e+00  2.348e-01   6.844 6.19e-11 ***

we can see the p-value in the result actually is the p-value of the ’conditionBrain‘, because of the parameter summary(lm(unlist(assay[i,]) ~ condition))$coef[2,4]

All the above is my problems, thanks again for your time, your reply will do me a lot favor!!!

jleechung commented 2 years ago

Hi @ninjaxfy

The getAlternativePromoters function is implemented for two levels only, while your condition factor seems to have more than two levels.

You could try releveling your condition factor and see if that works.

ninjaxfy commented 2 years ago

Thanks a lot, I get it, your reply really do me a lot favor!!