ChristofferFlensburg / superFreq

Analysis pipeline for cancer sequencing data
MIT License
110 stars 33 forks source link

Maximum Samples Per Patient? #53

Closed mkinnaman closed 3 years ago

mkinnaman commented 4 years ago

Hi Again Christoff-

So I just tried to run on a multiregion case that I have - 14 WGS samples total and when it got to the clonality part of the pipeline I threw the following error:

Error in if (power < 10) return(NA) : missing value where TRUE/FALSE needed Calls: superFreq ... extractClonalities -> sapply -> sapply -> lapply -> FUN

I attached the runtime tracking log - let me know what you think...

runtimeTracking.log

ChristofferFlensburg commented 4 years ago

Hey!

There is no limit to the number of samples (we've done over 20 matched samples internally), although memory can become an issue, in particular for WGS.

The error is from the clonal tracking of copy numbers in getStories.R, in particular the part where it checks if it's the same allele that is affected across samples. That is, maybe one sample has gained an extra copy of the maternal allele, another sample gained the paternal allele. To do this, it first checks how much statistical power it has to look for this, from heterozygous SNPs reacting in opposite directions in different samples. It then checks if there is enough power to carry out this analysis, and if not power < 10, it skips it.

Seems that in your case, power was calculated to NA or 'NaN', which threw this error at this check. I had a look at how this is calculated, and could not find any glaring bugs.

Maybe as first step, have a look at the copy number plots (that were plotted to plots/erms1/CNV), and see if they look ok for all samples, in particular the MAFs in the second panel.

If they look decently sane, then I think I might have to ask for your copy number data to reproduce the error, if you are ok with that. They should be in [Rdirectory]/erms1/clusters.Rdata. Could you email to flensburg.c at wehi.edu.au. If you're not comfortable with that (which would be understandable, it's human sequencing data), we can try to zoom and I can have a look at your machine from remote.

mkinnaman commented 4 years ago

Thanks Cristoff,

I didnt see anything glaring from the CN profiles...I emailed you the clusters.Rdata file - appreciate your input.

Thanks,

Michael

ChristofferFlensburg commented 4 years ago

Thanks! I didn't receive any email though. :/ I double checked that I gave the correct email flensburg.c at wehi.edu.au , maybe double check on your end as well? Could also be a too large attachment... Could also try my gmail christoffer.flensburg at gmail.com? Thanks, sorry!

ChristofferFlensburg commented 4 years ago

I got it now, thanks! I probably won't have time to look at it this week though.

eberko commented 4 years ago

Hi Christoffer, I received the same error message. As requested, I am posting the sessionInfo below. Please let me know if you require any additional information, and thank you for your help!

R version 3.6.2 (2019-12-12) Platform: x86_64-pc-linux-gnu (64-bit) Running under: CentOS Linux 7 (Core)

Matrix products: default BLAS: /cm/shared/apps_chop/R/3.6.2/lib64/R/lib/libRblas.so LAPACK: /cm/shared/apps_chop/R/3.6.2/lib64/R/lib/libRlapack.so

locale: [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
[3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
[5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
[7] LC_PAPER=en_US.UTF-8 LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C

attached base packages: [1] stats4 parallel stats graphics grDevices utils datasets [8] methods base

other attached packages: [1] superFreq_1.3.2 BSgenome.Hsapiens.UCSC.hg38_1.4.1 [3] BSgenome.Hsapiens.UCSC.hg19_1.4.0 BSgenome.Mmusculus.UCSC.mm10_1.4.0 [5] BSgenome_1.54.0 rtracklayer_1.46.0
[7] MutationalPatterns_1.12.0 NMF_0.22.0
[9] Biobase_2.46.0 cluster_2.1.0
[11] rngtools_1.5 pkgmaker_0.31.1
[13] registry_0.5-1 limma_3.42.2
[15] Rsubread_2.0.1 R.oo_1.23.0
[17] R.methodsS3_1.8.0 Rsamtools_2.2.3
[19] Biostrings_2.54.0 XVector_0.26.0
[21] biomaRt_2.42.1 GenomicRanges_1.38.0
[23] GenomeInfoDb_1.22.1 IRanges_2.20.2
[25] S4Vectors_0.24.4 BiocGenerics_0.32.0
[27] WriteXLS_5.0.0

loaded via a namespace (and not attached): [1] bitops_1.0-6 matrixStats_0.56.0
[3] bit64_0.9-7 doParallel_1.0.15
[5] RColorBrewer_1.1-2 progress_1.2.2
[7] httr_1.4.1 tools_3.6.2
[9] R6_2.4.1 DBI_1.1.0
[11] colorspace_1.4-1 withr_2.2.0
[13] tidyselect_1.0.0 prettyunits_1.1.1
[15] bit_1.1-15.2 curl_4.3
[17] compiler_3.6.2 ggdendro_0.1-20
[19] DelayedArray_0.12.3 scales_1.1.0
[21] askpass_1.1 rappdirs_0.3.1
[23] stringr_1.4.0 digest_0.6.25
[25] pkgconfig_2.0.3 bibtex_0.4.2.2
[27] dbplyr_1.4.3 rlang_0.4.5
[29] RSQLite_2.2.0 BiocParallel_1.20.1
[31] dplyr_0.8.5 VariantAnnotation_1.32.0
[33] RCurl_1.98-1.2 magrittr_1.5
[35] GenomeInfoDbData_1.2.2 Matrix_1.2-18
[37] Rcpp_1.0.4.6 munsell_0.5.0
[39] lifecycle_0.2.0 stringi_1.4.6
[41] MASS_7.3-51.5 SummarizedExperiment_1.16.1 [43] zlibbioc_1.32.0 plyr_1.8.6
[45] BiocFileCache_1.10.2 grid_3.6.2
[47] blob_1.2.1 crayon_1.3.4
[49] lattice_0.20-38 cowplot_1.0.0
[51] GenomicFeatures_1.38.2 hms_0.5.3
[53] pillar_1.4.3 reshape2_1.4.4
[55] codetools_0.2-16 XML_3.99-0.3
[57] glue_1.4.0 vctrs_0.2.4
[59] foreach_1.5.0 gtable_0.3.0
[61] openssl_1.4.1 purrr_0.3.4
[63] assertthat_0.2.1 ggplot2_3.3.0
[65] gridBase_0.4-7 xtable_1.8-4
[67] pracma_2.2.9 tibble_3.0.1
[69] iterators_1.0.12 GenomicAlignments_1.22.1
[71] AnnotationDbi_1.48.0 memoise_1.1.0
[73] ellipsis_0.3.0

mkinnaman commented 4 years ago

Christoff -

Did you ever get a chance to look at this?

Thanks, Michael

mark-welsh commented 4 years ago

I made these changes to "ignore" this error to try and get the analysis to finish. It winds up having issues when making parts of the river plots, but overall the results look correct 👍

@ChristofferFlensburg could you explain what the harm would be when skipping the NA values?

mkinnaman commented 4 years ago

Mark - your changes worked for me - results look good thus far. Thanks!

ChristofferFlensburg commented 4 years ago

Hi all!

I am so sorry for having dropped this ball, and thanks for finding a way to bypass the problem mark!

As I understand that change skips this part of the algorithm when NAs are found. So it'll just remove the ability to detect the same CNA affecting opposite alleles in different samples. Shouldn't have any consequences downstream.

I have a new version 1.4 almost ready with quite a few changes and bug fixes, that support R 4.0.0 and latest bioconductor, and should cope with all latest samtools version. I'll include your changes @mark-welsh, but with a warning message. Maybe then I'll have everyone (or someone) that has time in this thread to try the new version and see if the issue persists (ie get the warning).

Can i merge your branch directly mark, or do you need to do a pull request? Sorry I'm n00b at github!

Sorry again for the more than 2 month delay...

mark-welsh commented 4 years ago

@ChristofferFlensburg thanks for the update! Yes, I can submit a PR to this repo with the changes, I think that will be the easiest way. I would also be happy to test the latest version once it's ready, so please keep me posted!

ChristofferFlensburg commented 3 years ago

Closing old issues.