FunGeST / Palimpsest

An R package for studying mutational signatures and structural variant signatures along clonal evolution in cancer.
69 stars 19 forks source link

Error when changing nrun and range_of_sigs parameters #9

Closed AndreVidas closed 5 years ago

AndreVidas commented 5 years ago

Hi Palimpsest team.

I'm working on the large 560 breast cancer data from Nik-Zainal et al. (https://doi.org/10.1038/nature17676) in the pursuit of reproducing their rearrangement signatures. I'm able to get the signatures by setting nrun = 30 & range_of_sigs = 12. However in this process I noticed a lot of errors from other runs with the same data but with other settings of nrun and range_of_sigs:

When changing NRUN

nrun =50 & range_of_sigs = 12

[1] "Estimating the optimal number of mutational signatures..."

Timing stopped at: 193.7 38.88 318.3 Timing stopped at: 158.8 29.58 334.8 Timing stopped at: 205.5 40.71 419.8 Timing stopped at: 285.6 44.92 375 Timing stopped at: 252.6 35.51 507.2 Timing stopped at: 402.6 49.61 454 Timing stopped at: 396.7 52.06 537.6 Timing stopped at: 453.8 49.97 563.3 Timing stopped at: 434.3 50.08 532.1 Timing stopped at: 264.8 44.34 374.2 Timing stopped at: 394.1 39.35 707.8 Error in (function (...) : All the runs produced an error: -#1 [r=2] -> (list) object cannot be coerced to type 'double' [in call to 'which.min'] -#2 [r=3] -> (list) object cannot be coerced to type 'double' [in call to 'which.min'] -#3 [r=4] -> (list) object cannot be coerced to type 'double' [in call to 'which.min'] -#4 [r=5] -> (list) object cannot be coerced to type 'double' [in call to 'which.min'] -#5 [r=6] -> (list) object cannot be coerced to type 'double' [in call to 'which.min'] -#6 [r=7] -> (list) object cannot be coerced to type 'double' [in call to 'which.min'] -#7 [r=8] -> (list) object cannot be coerced to type 'double' [in call to 'which.min'] -#8 [r=9] -> (list) object cannot be coerced to type 'double' [in call to 'which.min'] -#9 [r=10] -> (list) object cannot be coerced to type 'double' [in call to 'which.min'] -#10 [r=11] -> (list) object cannot be coerced to type 'double' [in call to 'which.min'] -#11 [r=12] -> (list) object cannot be coerced to type 'double' [in call t Calls: deconvolution_nmf -> nmfEstimateRank -> do.call -> Execution halted

nrun = 70 & range_of_sigs = 12

[1] "Estimating the optimal number of mutational signatures..."

Timing stopped at: 142.4 39.54 239 Timing stopped at: 220.7 36.57 334.9 Timing stopped at: 245.6 45.64 367.7 Timing stopped at: 215.7 37.63 372.5 Timing stopped at: 387 42.69 504.4 Timing stopped at: 283.5 43 395.7 Timing stopped at: 426.1 50.33 558.4 Timing stopped at: 395.5 48.59 528.5 Timing stopped at: 575.1 50.45 626.4 Timing stopped at: 228.7 39.92 312.6 Timing stopped at: 375 44.68 539.6 Error in (function (...) : All the runs produced an error: -#1 [r=2] -> (list) object cannot be coerced to type 'double' [in call to 'which.min'] -#2 [r=3] -> (list) object cannot be coerced to type 'double' [in call to 'which.min'] -#3 [r=4] -> (list) object cannot be coerced to type 'double' [in call to 'which.min'] -#4 [r=5] -> (list) object cannot be coerced to type 'double' [in call to 'which.min'] -#5 [r=6] -> (list) object cannot be coerced to type 'double' [in call to 'which.min'] -#6 [r=7] -> (list) object cannot be coerced to type 'double' [in call to 'which.min'] -#7 [r=8] -> (list) object cannot be coerced to type 'double' [in call to 'which.min'] -#8 [r=9] -> (list) object cannot be coerced to type 'double' [in call to 'which.min'] -#9 [r=10] -> (list) object cannot be coerced to type 'double' [in call to 'which.min'] -#10 [r=11] -> (list) object cannot be coerced to type 'double' [in call to 'which.min'] -#11 [r=12] -> (list) object cannot be coerced to type 'double' [in call t Calls: deconvolution_nmf -> nmfEstimateRank -> do.call -> Execution halted

nrun = 80 & range_of_sigs = 12

[1] "Calculating exposures of signatures in the input tumors"

Timing stopped at: 117.7 22.11 426.2 Timing stopped at: 188 35.73 410.3 Timing stopped at: 289.2 35.57 496.8 Timing stopped at: 241.9 41.26 342.2 Timing stopped at: 482.6 48.48 632.1 Timing stopped at: 234.3 49.22 290.3 Timing stopped at: 273.6 30.33 814 Timing stopped at: 179 41.72 303.1 Timing stopped at: 454.7 40.21 905.5 Timing stopped at: 759.5 52.74 1031 geom_path: Each group consists of only one observation. Do you need to adjust the group aesthetic? geom_path: Each group consists of only one observation. Do you need to adjust the group aesthetic? Error: NMF::nmf - invalid argument 'rank': must be a single numeric value In addition: Warning messages: 1: In (function (...) : NAs were produced due to errors in some of the runs: -#1[r=2] -> (list) object cannot be coerced to type 'double' [in call to 'which.min'] -#2[r=3] -> (list) object cannot be coerced to type 'double' [in call to 'which.min'] -#3[r=4] -> (list) object cannot be coerced to type 'double' [in call to 'which.min'] -#4[r=5] -> (list) object cannot be coerced to type 'double' [in call to 'which.min'] -#5[r=6] -> (list) object cannot be coerced to type 'double' [in call to 'which.min'] -#6[r=7] -> (list) object cannot be coerced to type 'double' [in call to 'which.min'] -#7[r=8] -> (list) object cannot be coerced to type 'double' [in call to 'which.min'] -#8[r=9] -> (list) object cannot be coerced to type 'double' [in call to 'which.min'] -#9[r=10] -> (list) object cannot be coerced to type 'double' [in call to 'which.min'] -#10[r=11] -> (list) object cannot be coerced to type 'double' [in call to 'which.min'] 2: Removed 40 rows containing missing values (geom_path). 3: Removed 100 rows containing missing values (geom_point). 4: In max(abs(diff(z))) : no non-missing arguments to max; returning -Inf Execution halted

When changing RANGE_OF_SIGS

nrun =60 & range_of_sigs = 8

[1] "Estimating the optimal number of mutational signatures..."

Timing stopped at: 219.6 40.97 309.6 Timing stopped at: 290.9 42.98 352 Timing stopped at: 243 35.3 454.1 Timing stopped at: 131.2 20.06 530.5 Timing stopped at: 224.1 28.04 671.6 Timing stopped at: 301.6 38.47 530.2 Timing stopped at: 545.8 52.13 795.9 Error in (function (...) : All the runs produced an error: -#1 [r=2] -> (list) object cannot be coerced to type 'double' [in call to 'which.min'] -#2 [r=3] -> (list) object cannot be coerced to type 'double' [in call to 'which.min'] -#3 [r=4] -> (list) object cannot be coerced to type 'double' [in call to 'which.min'] -#4 [r=5] -> (list) object cannot be coerced to type 'double' [in call to 'which.min'] -#5 [r=6] -> (list) object cannot be coerced to type 'double' [in call to 'which.min'] -#6 [r=7] -> (list) object cannot be coerced to type 'double' [in call to 'which.min'] -#7 [r=8] -> (list) object cannot be coerced to type 'double' [in call to 'which.min'] Calls: deconvolution_nmf -> nmfEstimateRank -> do.call -> Execution halted

Do you have any idea of why this is happening?

Best regards, André

sessionInfo()

R version 3.5.0 (2018-04-23) Platform: x86_64-pc-linux-gnu (64-bit) Running under: CentOS Linux 7 (Core)

Matrix products: default BLAS/LAPACK: /cm/shared/apps/intel/parallel_studio_xe/2018_update2/compilers_and_libraries_2018.2.199/linux/mkl/lib/intel64_lin/libmkl_gf_lp64.so

locale: [1] C

attached base packages: [1] stats4 parallel stats graphics grDevices utils datasets [8] methods base

other attached packages: [1] getopt_1.20.2 Palimpsest_1.0.0 GenomicRanges_1.32.7 [4] GenomeInfoDb_1.16.0 IRanges_2.14.12 S4Vectors_0.18.3
[7] BiocInstaller_1.30.0 NMF_0.21.0 bigmemory_4.5.33
[10] Biobase_2.40.0 BiocGenerics_0.26.0 cluster_2.0.7-1
[13] rngtools_1.3.1 pkgmaker_0.27 registry_0.5
[16] RColorBrewer_1.1-2 RCircos_1.2.0 bedr_1.0.4
[19] RLinuxModules_0.2

loaded via a namespace (and not attached): [1] lsa_0.73.1 bitops_1.0-6
[3] matrixStats_0.54.0 bit64_0.9-7
[5] doParallel_1.0.14 progress_1.2.0
[7] httr_1.3.1 Rgraphviz_2.24.0
[9] SnowballC_0.5.1 tools_3.5.0
[11] R6_2.3.0 KernSmooth_2.23-15
[13] DBI_1.0.0 lazyeval_0.2.1
[15] colorspace_1.3-2 withr_2.1.2
[17] tidyselect_0.2.5 prettyunits_1.0.2
[19] bit_1.1-14 compiler_3.5.0
[21] VennDiagram_1.6.20 graph_1.58.2
[23] formatR_1.5 DelayedArray_0.6.6
[25] rtracklayer_1.40.6 caTools_1.17.1.1
[27] scales_1.0.0 stringr_1.3.1
[29] digest_0.6.18 Rsamtools_1.32.3
[31] R.utils_2.7.0 XVector_0.20.0
[33] pkgconfig_2.0.2 bibtex_0.4.2
[35] plotrix_3.7-4 BSgenome_1.48.0
[37] rlang_0.3.0.1 RSQLite_2.1.1
[39] bindr_0.1.1 gtools_3.8.1
[41] BiocParallel_1.14.2 dplyr_0.7.7
[43] R.oo_1.22.0 VariantAnnotation_1.26.1
[45] RCurl_1.95-4.11 magrittr_1.5
[47] GenomeInfoDbData_1.1.0 futile.logger_1.4.3
[49] Matrix_1.2-15 Rcpp_1.0.0
[51] munsell_0.5.0 R.methodsS3_1.7.1
[53] stringi_1.2.4 yaml_2.2.0
[55] SummarizedExperiment_1.10.1 zlibbioc_1.26.0
[57] gplots_3.0.1 plyr_1.8.4
[59] grid_3.5.0 blob_1.1.1
[61] gdata_2.18.0 bigmemory.sri_0.1.3
[63] crayon_1.3.4 lattice_0.20-38
[65] Biostrings_2.48.0 GenomicFeatures_1.32.3
[67] hms_0.4.2 pillar_1.3.0
[69] reshape2_1.4.3 codetools_0.2-15
[71] biomaRt_2.36.1 futile.options_1.0.1
[73] XML_3.98-1.16 glue_1.3.0
[75] lambda.r_1.2.3 data.table_1.11.8
[77] foreach_1.4.4 testthat_2.0.1
[79] gtable_0.2.0 purrr_0.2.5
[81] assertthat_0.2.0 ggplot2_3.1.0
[83] gridBase_0.4-7 xtable_1.8-3
[85] tibble_1.4.2 iterators_1.0.10
[87] GenomicAlignments_1.16.0 AnnotationDbi_1.42.1
[89] memoise_1.1.0 bindrcpp_0.2.2

jayendrashinde91 commented 5 years ago

Hi! Thank you for your query. Here is an example implementation of deconvolution_nmf(): denovo_signatures <- deconvolution_nmf(input_data = propSVsByCat, type = "SV", range_of_sigs = 2:12, nrun =20,method = "brunet",resdir = resdir.) The argument 'range_of_sigs' requires a range of possible number of mutational signatures (2:12) to estimate the most stable number of signatures.

If you already have established the exact estimate of the number of signatures to extract from your input data, the implementation would be as follows: denovo_signatures <- deconvolution_nmf(input_data = propSVsByCat, type = "SV", num_of_sigs = 12, nrun =20,method = "brunet",resdir = resdir.) The argument 'num_of_sigs' takes in the exact number of estimated signatures you provide. In your case nrun =50 & num_of_sigs = 12.

Best regards, Jay

AndreVidas commented 5 years ago

Hi again, sorry for my incorrect description. When I wrote range_of_sigs = 12, I actually meant 2:12. So I used an interval rather than an integer for range_of_sigs, but I still get the error message. The same problem arrises when I choose different nrun as described. Still it works for nrun = 30 & range_of_sigs = 2:12 or nrun = 60 & range_of_sigs = 2:12 for some reason.

I also noticed that the toy example data in Palimpsest_test_script.R also produces an error message when changing range_of_sigs. If I set nrun = 20 & range_of_sigs = 2:7, for this toy data I get the following error:

Error: NMF::nmf - invalid argument 'rank': must be a single numeric value In addition: Warning message: In max(abs(diff(z))) : no non-missing arguments to max; returning -Inf

Best, André

jayendrashinde91 commented 5 years ago

Hi André, Thanks for bringing up this issue. There was a bug in defining the most stable number of signatures in certain situations. We have pushed a fix for this problem. If you could reinstall the package and rerun, you should be able to get your estimated number of signatures. Let me know if this update helps. Thanks, Jay

AndreVidas commented 5 years ago

Al right, thanks.

So now I tried it again after the update and it definitely works better now. Now the following works which didn't work before:

nrun =50 & range_of_sigs = 12 nrun =70 & range_of_sigs = 12 nrun =80 & range_of_sigs = 12

I'm also able to make the toy data work with different settings of range_of_sigs (nrun = 20 & range_of_sigs = 2:7), which didn't work before either.

However when I try with other settings of range_of_sigs on the large 560 cancer data I still get error messages:

nrun = 60 & range_of_sigs = 1:8

Warning messages: 1: replacing previous import 'NMF::entropy' by 'lsa::entropy' when loading 'Palimpsest' 2: replacing previous import 'NMF::dispersion' by 'plotrix::dispersion' when loading 'Palimpsest' 3: replacing previous import 'gplots::plotCI' by 'plotrix::plotCI' when loading 'Palimpsest' Timing stopped at: 303.4 46.01 442 Timing stopped at: 142.8 24.07 633.5 Timing stopped at: 523.7 48.68 572.7 Error in deconvolution_nmf(input_data = propSVsByCat, type = "SV", range_of_sigs = 2:USER_sigs_range_top, : object 'steep_index' not found In addition: Warning messages: 1: In (function (...) : NAs were produced due to errors in some of the runs: -#2[r=3] -> (list) object cannot be coerced to type 'double' [in call to 'which.min'] -#4[r=5] -> (list) object cannot be coerced to type 'double' [in call to 'which.min'] -#6[r=7] -> (list) object cannot be coerced to type 'double' [in call to 'which.min'] 2: Removed 30 rows containing missing values (geom_point). Execution halted

nrun = 30 & range_of_sigs = 1:8

Warning messages: 1: replacing previous import 'NMF::entropy' by 'lsa::entropy' when loading 'Palimpsest' 2: replacing previous import 'NMF::dispersion' by 'plotrix::dispersion' when loading 'Palimpsest' 3: replacing previous import 'gplots::plotCI' by 'plotrix::plotCI' when loading 'Palimpsest' Timing stopped at: 268.2 43.25 974.8 Timing stopped at: 15.77 5.16 362.2 Error in which.min(sapply(res.runs, "[[", "residuals")) : (list) object cannot be coerced to type 'double' Calls: deconvolution_nmf ... nmf -> nmf -> nmf -> .local -> system.time -> run.all In addition: Warning messages: 1: In (function (...) : NAs were produced due to errors in some of the runs: -#5[r=6] -> (list) object cannot be coerced to type 'double' [in call to 'which.min'] -#6[r=7] -> (list) object cannot be coerced to type 'double' [in call to 'which.min'] 2: Removed 20 rows containing missing values (geom_point). Timing stopped at: 134.4 42.27 921.3 Execution halted

If I change range_of_sigs from 1:8 to 1:12, I'm able to get the signatures without errors for both nrun = 30 and nrun = 60.

Best, André

jayendrashinde91 commented 5 years ago

Hi André, We gave the 560 WGS breast cancers dataset a try at our end and we were able to produce the results smoothly without the above-mentioned errors. It is difficult to suggest why is this error is happening only at your end. Could you kindly try reinstalling the NMF dependency and give it another try? devtools::install_github("renozao/NMF")

Also, can you share your input matrix? I can try to reproduce the errors at my end again with your data this time. Best, Jay

AndreVidas commented 5 years ago

Hi Jay,

Thanks, now it works after I installed NMF from github.

Best, André