Closed mkinnaman closed 3 years ago
Hey!
There is no limit to the number of samples (we've done over 20 matched samples internally), although memory can become an issue, in particular for WGS.
The error is from the clonal tracking of copy numbers in getStories.R
, in particular the part where it checks if it's the same allele that is affected across samples. That is, maybe one sample has gained an extra copy of the maternal allele, another sample gained the paternal allele. To do this, it first checks how much statistical power
it has to look for this, from heterozygous SNPs reacting in opposite directions in different samples. It then checks if there is enough power
to carry out this analysis, and if not power < 10
, it skips it.
Seems that in your case, power was calculated to NA
or 'NaN', which threw this error at this check. I had a look at how this is calculated, and could not find any glaring bugs.
Maybe as first step, have a look at the copy number plots (that were plotted to plots/erms1/CNV), and see if they look ok for all samples, in particular the MAFs in the second panel.
If they look decently sane, then I think I might have to ask for your copy number data to reproduce the error, if you are ok with that. They should be in [Rdirectory]/erms1/clusters.Rdata
. Could you email to flensburg.c at wehi.edu.au. If you're not comfortable with that (which would be understandable, it's human sequencing data), we can try to zoom and I can have a look at your machine from remote.
Thanks Cristoff,
I didnt see anything glaring from the CN profiles...I emailed you the clusters.Rdata file - appreciate your input.
Thanks,
Michael
Thanks! I didn't receive any email though. :/ I double checked that I gave the correct email flensburg.c at wehi.edu.au , maybe double check on your end as well? Could also be a too large attachment... Could also try my gmail christoffer.flensburg at gmail.com? Thanks, sorry!
I got it now, thanks! I probably won't have time to look at it this week though.
Hi Christoffer, I received the same error message. As requested, I am posting the sessionInfo below. Please let me know if you require any additional information, and thank you for your help!
R version 3.6.2 (2019-12-12) Platform: x86_64-pc-linux-gnu (64-bit) Running under: CentOS Linux 7 (Core)
Matrix products: default BLAS: /cm/shared/apps_chop/R/3.6.2/lib64/R/lib/libRblas.so LAPACK: /cm/shared/apps_chop/R/3.6.2/lib64/R/lib/libRlapack.so
locale:
[1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
[3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
[5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
[7] LC_PAPER=en_US.UTF-8 LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
attached base packages: [1] stats4 parallel stats graphics grDevices utils datasets [8] methods base
other attached packages:
[1] superFreq_1.3.2 BSgenome.Hsapiens.UCSC.hg38_1.4.1
[3] BSgenome.Hsapiens.UCSC.hg19_1.4.0 BSgenome.Mmusculus.UCSC.mm10_1.4.0
[5] BSgenome_1.54.0 rtracklayer_1.46.0
[7] MutationalPatterns_1.12.0 NMF_0.22.0
[9] Biobase_2.46.0 cluster_2.1.0
[11] rngtools_1.5 pkgmaker_0.31.1
[13] registry_0.5-1 limma_3.42.2
[15] Rsubread_2.0.1 R.oo_1.23.0
[17] R.methodsS3_1.8.0 Rsamtools_2.2.3
[19] Biostrings_2.54.0 XVector_0.26.0
[21] biomaRt_2.42.1 GenomicRanges_1.38.0
[23] GenomeInfoDb_1.22.1 IRanges_2.20.2
[25] S4Vectors_0.24.4 BiocGenerics_0.32.0
[27] WriteXLS_5.0.0
loaded via a namespace (and not attached):
[1] bitops_1.0-6 matrixStats_0.56.0
[3] bit64_0.9-7 doParallel_1.0.15
[5] RColorBrewer_1.1-2 progress_1.2.2
[7] httr_1.4.1 tools_3.6.2
[9] R6_2.4.1 DBI_1.1.0
[11] colorspace_1.4-1 withr_2.2.0
[13] tidyselect_1.0.0 prettyunits_1.1.1
[15] bit_1.1-15.2 curl_4.3
[17] compiler_3.6.2 ggdendro_0.1-20
[19] DelayedArray_0.12.3 scales_1.1.0
[21] askpass_1.1 rappdirs_0.3.1
[23] stringr_1.4.0 digest_0.6.25
[25] pkgconfig_2.0.3 bibtex_0.4.2.2
[27] dbplyr_1.4.3 rlang_0.4.5
[29] RSQLite_2.2.0 BiocParallel_1.20.1
[31] dplyr_0.8.5 VariantAnnotation_1.32.0
[33] RCurl_1.98-1.2 magrittr_1.5
[35] GenomeInfoDbData_1.2.2 Matrix_1.2-18
[37] Rcpp_1.0.4.6 munsell_0.5.0
[39] lifecycle_0.2.0 stringi_1.4.6
[41] MASS_7.3-51.5 SummarizedExperiment_1.16.1
[43] zlibbioc_1.32.0 plyr_1.8.6
[45] BiocFileCache_1.10.2 grid_3.6.2
[47] blob_1.2.1 crayon_1.3.4
[49] lattice_0.20-38 cowplot_1.0.0
[51] GenomicFeatures_1.38.2 hms_0.5.3
[53] pillar_1.4.3 reshape2_1.4.4
[55] codetools_0.2-16 XML_3.99-0.3
[57] glue_1.4.0 vctrs_0.2.4
[59] foreach_1.5.0 gtable_0.3.0
[61] openssl_1.4.1 purrr_0.3.4
[63] assertthat_0.2.1 ggplot2_3.3.0
[65] gridBase_0.4-7 xtable_1.8-4
[67] pracma_2.2.9 tibble_3.0.1
[69] iterators_1.0.12 GenomicAlignments_1.22.1
[71] AnnotationDbi_1.48.0 memoise_1.1.0
[73] ellipsis_0.3.0
Christoff -
Did you ever get a chance to look at this?
Thanks, Michael
I made these changes to "ignore" this error to try and get the analysis to finish. It winds up having issues when making parts of the river plots, but overall the results look correct 👍
@ChristofferFlensburg could you explain what the harm would be when skipping the NA
values?
Mark - your changes worked for me - results look good thus far. Thanks!
Hi all!
I am so sorry for having dropped this ball, and thanks for finding a way to bypass the problem mark!
As I understand that change skips this part of the algorithm when NAs are found. So it'll just remove the ability to detect the same CNA affecting opposite alleles in different samples. Shouldn't have any consequences downstream.
I have a new version 1.4 almost ready with quite a few changes and bug fixes, that support R 4.0.0 and latest bioconductor, and should cope with all latest samtools version. I'll include your changes @mark-welsh, but with a warning message. Maybe then I'll have everyone (or someone) that has time in this thread to try the new version and see if the issue persists (ie get the warning).
Can i merge your branch directly mark, or do you need to do a pull request? Sorry I'm n00b at github!
Sorry again for the more than 2 month delay...
@ChristofferFlensburg thanks for the update! Yes, I can submit a PR to this repo with the changes, I think that will be the easiest way. I would also be happy to test the latest version once it's ready, so please keep me posted!
Closing old issues.
Hi Again Christoff-
So I just tried to run on a multiregion case that I have - 14 WGS samples total and when it got to the clonality part of the pipeline I threw the following error:
Error in if (power < 10) return(NA) : missing value where TRUE/FALSE needed Calls: superFreq ... extractClonalities -> sapply -> sapply -> lapply -> FUN
I attached the runtime tracking log - let me know what you think...
runtimeTracking.log