ctlab / fgsea

Fast Gene Set Enrichment Analysis
Other
379 stars 67 forks source link

fgsea never ending #19

Closed llrs closed 6 years ago

llrs commented 6 years ago

I have some problem for a while, it seems like fgsea never return the results:

system.time(gseaSizeEffect <- fgsea(grouping, comp1, nperm = 10000))
##    user  system elapsed 
##   0.168   0.116   0.517 
system.time(gseaSizeEffect <- fgsea(grouping, comp1, nperm = 20000))
##
## Timing stopped at: 0.28 0.416 731.2
system.time(gseaSizeEffect <- fgsea(grouping, comp1, nperm = 1000))
##    user  system elapsed 
##   0.052   0.000   0.058 
system.time(gseaSizeEffect <- fgsea(grouping, comp1, nperm = 2000))
## 
## Timing stopped at: 0.204 0.32 147.1

So while the 10k permutations takes 0.116 and 0.516 for the system and in total, when I double it it takes much more and I had to stop the function (I had this same function running for 3 days without returning any value), despite being reported that to the user it took less than a second the output wasn't ready after 10 minutes. And a similar thing happened when I used 1k permutation and doubled to 2k.

Sorry I couldn't make it reproducible, I really don't understand what can be triggering this. Here is some information about the data I am testing:

length(comp1)
## 505
length(grouping)
## 127
table(lengths(grouping))
## 
##  1  2  3  4  5  6  7  8  9 11 12 14 21 22 42 
## 61 16 10 13  4  3  3  4  1  4  3  2  1  1  1 
quantile(comp1)
##         0%        25%        50%        75%       100% 
## -0.2198301  0.0000000  0.0000000  0.0000000  0.2639637 

Perhaps the problem is that ~78% of my stats are 0, and so they are equivalent in the position. But when I removed them I also had the same problem.

The session_info("fgsea"):

Session info -----------------------------------------------------------------------------------------------------------------------------
 setting  value                       
 version  R version 3.4.3 (2017-11-30)
 system   i686, linux-gnu             
 ui       RStudio (1.1.423)           
 language en_US                       
 collate  en_US.UTF-8                 
 tz       Europe/Madrid               
 date     2018-02-15                  

Packages ---------------------------------------------------------------------------------------------------------------------------------
 package        * version  date       source         
 assertthat       0.2.0    2017-04-11 CRAN (R 3.4.3) 
 BH               1.65.0-1 2017-08-24 CRAN (R 3.4.2) 
 BiocParallel     1.12.0   2017-11-02 Bioconductor   
 cli              1.0.0    2017-11-05 CRAN (R 3.4.3) 
 colorspace       1.3-2    2016-12-14 CRAN (R 3.4.2) 
 compiler         3.4.3    2017-12-01 local          
 crayon           1.3.4    2017-09-16 CRAN (R 3.4.2) 
 data.table     * 1.10.4-3 2017-10-27 CRAN (R 3.4.3) 
 dichromat        2.0-0    2013-01-24 CRAN (R 3.4.2) 
 digest           0.6.15   2018-01-28 CRAN (R 3.4.3) 
 fastmatch        1.1-0    2017-01-28 CRAN (R 3.4.3) 
 fgsea          * 1.4.1    2018-01-26 Bioconductor   
 futile.logger    1.4.3    2016-07-10 CRAN (R 3.4.2) 
 futile.options   1.0.0    2010-04-06 CRAN (R 3.4.2) 
 ggplot2        * 2.2.1    2016-12-30 CRAN (R 3.4.3) 
 graphics       * 3.4.3    2017-12-01 local          
 grDevices      * 3.4.3    2017-12-01 local          
 grid             3.4.3    2017-12-01 local          
 gridExtra        2.3      2017-09-09 CRAN (R 3.4.2) 
 gtable           0.2.0    2016-02-26 CRAN (R 3.4.2) 
 labeling         0.3      2014-08-23 CRAN (R 3.4.2) 
 lambda.r         1.2      2017-09-16 CRAN (R 3.4.2) 
 lazyeval         0.2.1    2017-10-29 CRAN (R 3.4.2) 
 magrittr         1.5      2014-11-22 CRAN (R 3.4.2) 
 MASS             7.3-48   2017-12-24 CRAN (R 3.4.3) 
 methods        * 3.4.3    2017-12-01 local          
 munsell          0.4.3    2016-02-13 CRAN (R 3.4.2) 
 parallel       * 3.4.3    2017-12-01 local          
 pillar           1.1.0    2018-01-14 CRAN (R 3.4.3) 
 plyr             1.8.4    2016-06-08 CRAN (R 3.4.2) 
 R6               2.2.2    2017-06-17 CRAN (R 3.4.3) 
 RColorBrewer   * 1.1-2    2014-12-07 CRAN (R 3.4.2) 
 Rcpp           * 0.12.15  2018-01-20 cran (@0.12.15)
 reshape2       * 1.4.3    2017-12-11 CRAN (R 3.4.3) 
 rlang            0.1.6    2017-12-21 CRAN (R 3.4.3) 
 scales           0.5.0    2017-08-24 CRAN (R 3.4.2) 
 snow             0.4-2    2016-10-14 CRAN (R 3.4.2) 
 stats          * 3.4.3    2017-12-01 local          
 stringi          1.1.6    2017-11-17 CRAN (R 3.4.2) 
 stringr          1.2.0    2017-02-18 CRAN (R 3.4.3) 
 tibble           1.4.2    2018-01-22 CRAN (R 3.4.3) 
 tools            3.4.3    2017-12-01 local          
 utf8             1.1.3    2018-01-03 CRAN (R 3.4.3) 
 utils          * 3.4.3    2017-12-01 local          
 viridisLite      0.3.0    2018-02-01 CRAN (R 3.4.3) 
assaron commented 6 years ago

Hi Lluís,

Does it sitll happen if you set nproc=1?

llrs commented 6 years ago

Same data:

system.time(gseaSizeEffect <- fgsea(grouping, comp1, nperm = 10000))
## 
## Timing stopped at: 0.068 0.12 43.18
system.time(gseaSizeEffect <- fgsea(grouping, comp1, nperm = 10000, nproc = 1))
##    user  system elapsed 
##   0.260   0.028   0.289 
system.time(gseaSizeEffect <- fgsea(grouping, comp1, nperm = 20000, nproc = 1))
# I had to stop it after ~5 minutes (and restart Rstudio)

# In a R --vanilla session (After loading the previous part of the script):
system.time(gseaSizeEffect <- fgsea(grouping, comp1, nperm = 10000, nproc = 1))
# I had to close the terminal after ~5 minutes
# In a R --vanilla session 
#(Only loading the data and some packages: fgsea, reactome.db, org.Hs.eg.db, data.table):
system.time(gseaSizeEffect <- fgsea(grouping, comp1, nperm = 10000, nproc = 1))
# I had to close the terminal after ~5 minutes

Thanks for the quick response

assaron commented 6 years ago

Is it specific for your grouping and comp1 objects? How about exampleRanks and examplePathways? If yes, can you send these objects (with save(grouping, comp1, file="fgsea_test.rda")) here or by e-mail?

llrs commented 6 years ago

Yes, it seems specific to my data (sorry I should have checked that before). I compressed here the data (in zip format) in order to be able to send it through the issue.

assaron commented 6 years ago

I can reproduce this. We'll look into it a bit later.

llrs commented 6 years ago

Oh 😮 , I didn't expect to be reproducible, I thought it was a problem in my computer 😭 That's worst than I expected. Let me know if there is anything I can do.

ToledoEM commented 6 years ago

It doesn't seem to be driven by the number of permutations by it self, since repeated execution lead to the same outcome.

library(fgsea)
set.seed(123456789)
for (i in 1:10){
print(system.time(gseaSizeEffect <- fgsea(grouping, comp1, nperm = 1000,nproc = 1)))
}

   user  system elapsed 
  0.082   0.002   0.064 
   user  system elapsed 
  0.838   0.008   0.153 
   user  system elapsed 
  0.222   0.002   0.038 
   user  system elapsed 
  0.230   0.003   0.039 
   user  system elapsed 
  0.222   0.002   0.038 
   user  system elapsed 
  0.215   0.003   0.037 
   user  system elapsed 
  0.214   0.002   0.036 
   user  system elapsed 
  0.208   0.002   0.035 
   user  system elapsed 
  0.226   0.002   0.038 
   user  system elapsed 
  0.238   0.003   0.042 

Then executing the same code

   user  system elapsed 
  0.031   0.001   0.036 
   user  system elapsed 
  0.028   0.000   0.029 
   user  system elapsed 
  0.028   0.000   0.028 
   user  system elapsed 
  0.028   0.000   0.029 

### Frozen for 5 minutes ---

Sometimes I can execute this twice or thrice without an error, sometimes just once.

Also reproducible if sort.comp1 <- rev(sort(comp1)) it is used.

R version 3.4.3 (2017-11-30)
Platform: x86_64-apple-darwin15.6.0 (64-bit)
Running under: macOS High Sierra 10.13.3

Matrix products: default
BLAS: /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/3.4/Resources/lib/libRlapack.dylib

locale:
[1] en_GB.UTF-8/en_GB.UTF-8/en_GB.UTF-8/C/en_GB.UTF-8/en_GB.UTF-8

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

loaded via a namespace (and not attached):
 [1] compiler_3.4.3 magrittr_1.5   tools_3.4.3    pillar_1.1.0   glue_1.2.0     tibble_1.4.1   yaml_2.1.16    Rcpp_0.12.15   tidyr_0.7.2    rlang_0.1.6    purrr_0.2.4 
llrs commented 6 years ago

@ToledoEM Thanks, nice report. This makes me think that the error is in the c++ code.

assaron commented 6 years ago

It works now, however, current algorithm does not properly support equal ranks (zero or not) as the ranking is not full in such case. So be cautious about the results. See #18

llrs commented 6 years ago

Thanks for the fix.