Endless Removal of Factors

whitleyo commented 5 years ago

Hi,

I'm trying to run MOFA with GISTIC CNV data and vst transformed RNA-seq data (DESEQ2), and if I set the dropfactor threshold to 0.01 and tolerance to 1.0 for TrainingOptions, at some point, I can see in the output there's an endless dropping of factors until none are left. If I set the tolerance to 5.0 then the model converges.

From the output, I can see that the model appears to converge with deltaElbow ~2-5, and at some point a few jumps occur. Any idea/suggestions of how to handle this issue?

Other relevant information is that I played around with 20 factors, setting the tolerance to 5 and MOFA managed to find structure in the data.

Here's a subset of the code that failed


## setup MOFA object
MOFA.obj <- createMOFAobject(multi.assay)
DataOptions <- getDefaultDataOptions()
ModelOptions <- getDefaultModelOptions(MOFA.obj)
TrainOptions <- getDefaultTrainOptions()

DataOptions
ModelOptions
TrainOptions

## note that if we set tolerance below 5.0, we start dropping factors endlessly

ModelOptions$numFactors <- 36
TrainOptions$DropFactorThreshold <- 0.01
TrainOptions$tolerance <- 1.0

MOFAlist.fname <- 'MOFAlist.RData'
tryCatch({
  if (!file.exists(file.path(output.dir, MOFAlist.fname))) {
  # Get 1 MOFA models
  n_inits <- 10
  MOFAlist <- lapply(1:n_inits, function(it) {
    # set seed
    TrainOptions$seed <- 2019*it
    MOFA.obj <- prepareMOFA(
      MOFA.obj, 
      DataOptions = DataOptions,
      ModelOptions = ModelOptions,
      TrainOptions = TrainOptions
    )
    # run MOFA
    runMOFA(MOFA.obj, outfile=tempfile())
  })
  save(MOFAlist, file = file.path(output.dir, MOFAlist.fname))
} else {
  # Load File
  load(file.path(output.dir, MOFAlist.fname))
}
}, error = function(e) {sessionInfo()})

And the SessionInfo

R version 3.4.4 (2018-03-15)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 16.04.5 LTS

Matrix products: default
BLAS: /usr/lib/libblas/libblas.so.3.6.0
LAPACK: /home/owenwhitley/anaconda3/lib/libmkl_rt.so

locale:
 [1] LC_CTYPE=en_CA.UTF-8       LC_NUMERIC=C               LC_TIME=en_CA.UTF-8       
 [4] LC_COLLATE=en_CA.UTF-8     LC_MONETARY=en_CA.UTF-8    LC_MESSAGES=en_CA.UTF-8   
 [7] LC_PAPER=en_CA.UTF-8       LC_NAME=C                  LC_ADDRESS=C              
[10] LC_TELEPHONE=C             LC_MEASUREMENT=en_CA.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] parallel  stats4    stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] GSA_1.03                   MultiAssayExperiment_1.4.9 DESeq2_1.18.1             
 [4] SummarizedExperiment_1.8.1 DelayedArray_0.4.1         matrixStats_0.54.0        
 [7] Biobase_2.38.0             GenomicRanges_1.30.3       GenomeInfoDb_1.14.0       
[10] IRanges_2.12.0             S4Vectors_0.16.0           BiocGenerics_0.24.0       
[13] MOFA_0.99.0                biomaRt_2.34.2            

loaded via a namespace (and not attached):
 [1] bitops_1.0-6           bit64_0.9-7            doParallel_1.0.14      RColorBrewer_1.1-2    
 [5] progress_1.2.0         httr_1.4.0             backports_1.1.3        tools_3.4.4           
 [9] R6_2.3.0               rpart_4.1-13           vipor_0.4.5            Hmisc_4.2-0           
[13] DBI_1.0.0              lazyeval_0.2.1         colorspace_1.4-0       nnet_7.3-12           
[17] gridExtra_2.3          tidyselect_0.2.5       prettyunits_1.0.2      bit_1.1-14            
[21] compiler_3.4.4         htmlTable_1.13.1       checkmate_1.9.1        scales_1.0.0          
[25] genefilter_1.60.0      stringr_1.3.1          digest_0.6.18          foreign_0.8-71        
[29] XVector_0.18.0         base64enc_0.1-3        pkgconfig_2.0.2        htmltools_0.3.6       
[33] htmlwidgets_1.3        rlang_0.3.1            rstudioapi_0.9.0       RSQLite_2.1.1         
[37] shiny_1.2.0            bindr_0.1.1            jsonlite_1.6           BiocParallel_1.12.0   
[41] acepack_1.4.1          dplyr_0.7.8            RCurl_1.95-4.11        magrittr_1.5          
[45] Formula_1.2-3          GenomeInfoDbData_1.0.0 Matrix_1.2-15          Rcpp_1.0.0            
[49] ggbeeswarm_0.6.0       munsell_0.5.0          reticulate_1.10        yaml_2.2.0            
[53] stringi_1.2.4          zlibbioc_1.24.0        rhdf5_2.22.0           plyr_1.8.4            
[57] grid_3.4.4             blob_1.1.1             promises_1.0.1         ggrepel_0.8.0         
[61] shinydashboard_0.7.1   crayon_1.3.4           lattice_0.20-38        cowplot_0.9.4         
[65] splines_3.4.4          annotate_1.56.2        hms_0.4.2              locfit_1.5-9.1        
[69] knitr_1.21             pillar_1.3.1           geneplotter_1.56.0     reshape2_1.4.3        
[73] codetools_0.2-16       XML_3.98-1.11          glue_1.3.0             latticeExtra_0.6-28   
[77] data.table_1.12.0      httpuv_1.4.5.1         foreach_1.4.4          gtable_0.2.0          
[81] purrr_0.3.0            tidyr_0.8.2            assertthat_0.2.0       ggplot2_3.1.0         
[85] xfun_0.4               mime_0.6               xtable_1.8-3           later_0.7.5           
[89] survival_2.43-3        tibble_2.0.1           pheatmap_1.0.12        iterators_1.0.10      
[93] AnnotationDbi_1.40.0   beeswarm_0.2.3         memoise_1.1.0          cluster_2.0.7-1       
[97] bindrcpp_0.2.2         corrplot_0.84

Here's a snippet of the output:

...A Factor explains less than 1.0% of variance, dropping it and recomputing ELBO...

Iteration 1926: time=0.61, Factors=25

...A Factor explains less than 1.0% of variance, dropping it and recomputing ELBO...

Iteration 1927: time=0.57, Factors=24

...A Factor explains less than 1.0% of variance, dropping it and recomputing ELBO...

Iteration 1928: time=0.53, Factors=23

...A Factor explains less than 1.0% of variance, dropping it and recomputing ELBO...

Iteration 1929: time=0.50, Factors=22

...A Factor explains less than 1.0% of variance, dropping it and recomputing ELBO...

And the point of failure

Iteration 1946: time=0.07, Factors=5

...A Factor explains less than 1.0% of variance, dropping it and recomputing ELBO...

Iteration 1947: time=0.06, Factors=4

...A Factor explains less than 1.0% of variance, dropping it and recomputing ELBO...

Iteration 1948: time=0.05, Factors=3

...A Factor explains less than 1.0% of variance, dropping it and recomputing ELBO...

Iteration 1949: time=0.04, Factors=2

...A Factor explains less than 1.0% of variance, dropping it and recomputing ELBO...

Iteration 1950: time=0.02, Factors=1

...A Factor explains less than 1.0% of variance, dropping it and recomputing ELBO...

Shut down all components, no structure found in the data.

rargelaguet commented 5 years ago

This is weird, can you send me the MOFA object (before training) via Slack channel?

rargelaguet commented 5 years ago

I replied to you via Slack channel. Can you confirm if the problem is solved?

I copy paste the response here for other users:

the error should be fixed in the last commit, it was a numerical round-off error. Can you pull, re-install the python package* and try again?

Yet, the behaviour of shutting off factors when convergence is clearly reached is very unusual, it usually happens before that. With the new commit I achieve convergence with small tolerance, but it takes quite some time (>5000 iterations). To speed it up I would recommend training with less factors, and perhaps just fix them (TrainOptions$DropFactorThreshold = 0) to something like 10 or 15 . I often struggle to get interpretable factors beyond the ~15th, and as in PCA, the choice of the number of factors is always a bit arbitrary

whitleyo commented 5 years ago

Hi,

It appears that I'm no longer having this issue. For a single run I couldn't get convergence to happen in 10,000 iterations, but the error I was having previously is gone.

Thanks, Owen

rargelaguet commented 5 years ago

As I mentioned in the previous comment, do you have convergence issues even when trying with less factors? Otherwise you can just set the convergence threshold to ~2. From the variance explained plot the fit seems very good.

whitleyo commented 5 years ago

Hi,

When I set the convergence threshold to 2, for 15 factors, using the dataset I sent, I can obtain convergence in about 28 minutes. When I set the convergence threshold to 1.0, I do not obtain convergence within 10^6 iterations, which takes place over 2 days. Looking at the model output, I see that getELBO(MOFA.obj) gives NaN.

Something I noticed with using multiple runs with different seeds was that the solutions found were inconsistent, i.e. latent factors discovered were generally uncorrelated. Settings in that case were a tolerance of 1.0, maxiter = 5000, and 10 runs with different seeds.

Github isn't accepting R notebook html output so I can send you that document in the slack if you'd like.

Thanks again for all your help.

rargelaguet commented 5 years ago

Hi, this is a very weird case that I have never come across with. The models are indeed very inconsistent and I will need to dig in more closely to find out what is happening. Can you send me the details via slack?

P.S. How does the factor robustness plot look like with a high tolerance threshold (>1.0)?

Thanks.

bioFAM / MOFA

Endless Removal of Factors #27