HenrikBengtsson / PSCBS

🔬 R package: Analysis of Parent-Specific DNA Copy Numbers
https://cran.r-project.org/package=PSCBS
7 stars 4 forks source link

Error in Calling NTCN in Paired mode #63

Open hannanw opened 3 years ago

hannanw commented 3 years ago

Hi,

When calling NTCN on paired data, I am getting an error "Less than two modes were found in the empirical density of C1 after removing %d modes that are too weak (density < %g): %d", nModes - nrow(fit), minDensity, nrow(fit)

The data comes from a raw CEL file (from TCGA) and I processed it using PennCNV and APT to get the LRR and BAF values for each probe. I then converted the LRR to TCN (Total Copy Number) ratio with the following method 2*(2**LRR) as needed for input into PSCBS.

After which I am able to call AB and LoH without any issues but once I try to call NTCN that is when I get the error. What could be the possible reason behind this error and is there any way to resolve it? Any help or advice will be greatly appreciated, thanks!

The code and errors are below:

> head(data)
  chromosome       x       CT  betaT       CN  betaN
1          1 1156131 1.757384 0.8464 1.585239 0.9960
2          1 2234251 2.172085 0.5926 2.519673 0.4870
3          1 2329564 2.046717 1.0000 2.110022 0.9965
4          1 2553624 2.168920 0.5210 2.339661 0.5018
5          1 2936870 2.758049 0.0000 2.420881 0.0000
6          1 2951834 2.683242 0.0456 2.337104 0.0000
> data <- dropSegmentationOutliers(data)
> gaps <- findLargeGaps(data, minLength = 1e+06)
> knownSegments <- gapsToSegments(gaps)
> fit.ps <- segmentByPairedPSCBS(data, knownSegments = knownSegments, joinSegments=TRUE,avgDH="median")
> deltaAB.ps <- estimateDeltaAB(fit.ps, flavor="qq(DH)")
> deltaAB.ps
[1] 0.1855768
> fit.ps <- callAB(fit.ps, delta=deltaAB.ps)
> deltaLOH.ps <- estimateDeltaLOH(fit.ps, flavor="minC1|nonAB")
> deltaLOH.ps
[1] 0.503962
> fit.ps <- callLOH(fit.ps, delta=deltaLOH.ps)
> fit.ps <- callNTCN(fit.ps)
Error in throw.default(sprintf("Less than two modes were found in the empirical density of C1 after removing %d modes that are too weak (density < %g): %d",  : 
  Less than two modes were found in the empirical density of C1 after removing 2 modes that are too weak (density < 0.2): 1
In addition: Warning message:
R.methodsS3::throw() is deprecated. Use base::stop() instead, or R.oo::throw(). 
> traceback()
33: stop(...)
32: throw.default(sprintf("Less than two modes were found in the empirical density of C1 after removing %d modes that are too weak (density < %g): %d", 
        nModes - nrow(fit), minDensity, nrow(fit)))
31: throw(sprintf("Less than two modes were found in the empirical density of C1 after removing %d modes that are too weak (density < %g): %d", 
        nModes - nrow(fit), minDensity, nrow(fit)))
30: estimateKappaByC1Density.PairedPSCBS(this, ...)
29: estimateKappaByC1Density(this, ...)
28: estimateKappa.PairedPSCBS(fit)
27: estimateKappa(fit)
26: getVector.Arguments(static, x, ..., .name = .name)
25: getVector(static, x, ..., .name = .name)
24: getNumerics.Arguments(static, ..., asMode = "double", disallow = disallow)
23: getNumerics(static, ..., asMode = "double", disallow = disallow)
22: getDoubles.Arguments(static, ..., length = length)
21: getDoubles(static, ..., length = length)
20: getDouble.Arguments(static, ...)
19: getDouble(static, ...) at <text>#1
18: Arguments$getDouble(kappa, range = c(0, 1), disallow = disallow)
17: estimateDeltaCN.PairedPSCBS(fit)
16: estimateDeltaCN(fit)
15: getVector.Arguments(static, x, ..., .name = .name)
14: getVector(static, x, ..., .name = .name)
13: getNumerics.Arguments(static, ..., asMode = "double", disallow = disallow)
12: getNumerics(static, ..., asMode = "double", disallow = disallow)
11: getDoubles.Arguments(static, ..., length = length)
10: getDoubles(static, ..., length = length)
9: getDouble.Arguments(static, ...)
8: getDouble(static, ...) at <text>#1
7: Arguments$getDouble(delta, range = c(0, Inf), disallow = disallow)
6: callCopyNeutralByTCNofAB.PairedPSCBS(fit, ..., force = force)
5: callCopyNeutralByTCNofAB(fit, ..., force = force)
4: callCopyNeutral.PairedPSCBS(...)
3: callCopyNeutral(...)
2: callNTCN.PairedPSCBS(fit.ps)
1: callNTCN(fit.ps)

The sessionInfo() is below

> sessionInfo()
R version 4.0.2 (2020-06-22)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 18.04.4 LTS

Matrix products: default
BLAS:   /opt/R/4.0.2/lib/R/lib/libRblas.so
LAPACK: /opt/R/4.0.2/lib/R/lib/libRlapack.so

locale:
[1] C

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] DNAcopy_1.64.0  Hmisc_4.5-0     ggplot2_3.3.3   Formula_1.2-4  
[5] survival_3.1-12 lattice_0.20-41 PSCBS_0.65.0   

loaded via a namespace (and not attached):
 [1] xfun_0.22           tidyselect_1.1.0    purrr_0.3.4        
 [4] listenv_0.8.0       splines_4.0.2       colorspace_2.0-0   
 [7] vctrs_0.3.6         generics_0.1.0      htmltools_0.5.1.1  
[10] base64enc_0.1-3     utf8_1.2.1          rlang_0.4.10       
[13] R.oo_1.24.0         pillar_1.5.1        foreign_0.8-80     
[16] glue_1.4.2          withr_2.4.1         DBI_1.1.1          
[19] R.utils_2.10.1      aroma.light_3.20.0  RColorBrewer_1.1-2 
[22] matrixStats_0.58.0  jpeg_0.1-9          R.cache_0.14.0     
[25] lifecycle_1.0.0     stringr_1.4.0       munsell_0.5.0      
[28] gtable_0.3.0        R.methodsS3_1.8.1   future_1.21.0      
[31] htmlwidgets_1.5.3   codetools_0.2-16    knitr_1.31         
[34] latticeExtra_0.6-29 parallel_4.0.2      fansi_0.4.2        
[37] htmlTable_2.2.1     scales_1.1.1        backports_1.2.1    
[40] checkmate_2.0.0     parallelly_1.24.0   gridExtra_2.3      
[43] png_0.1-7           digest_0.6.27       stringi_1.5.3      
[46] dplyr_1.0.5         grid_4.0.2          tools_4.0.2        
[49] magrittr_2.0.1      tibble_3.1.0        cluster_2.1.0      
[52] crayon_1.4.1        pkgconfig_2.0.3     ellipsis_0.3.1     
[55] Matrix_1.2-18       data.table_1.14.0   rstudioapi_0.13    
[58] assertthat_0.2.1    R6_2.5.0            globals_0.14.0     
[61] rpart_4.1-15        nnet_7.3-14         compiler_4.0.2
HenrikBengtsson commented 3 years ago

Hi. Just to be clear, you're actually getting the error:

Error in throw.default(sprintf("Less than two modes were found in the empirical density of C1 after removing %d modes that are too weak (density < %g): %d",  : 
  Less than two modes were found in the empirical density of C1 after removing 2 modes that are too weak (density < 0.2): 1

which carries some more clues what's going on. Now, before continuing, I want to emphasize that calling copy number states (e.g. loss, neutral, gain, LOH) is complicated and should not really be assumed to be work automatically in all cases. Sometimes it works, but likely you've experienced not always.

Now, to the error:

Less than two modes were found in the empirical density of C1 after removing 2 modes that are too weak (density < 0.2): 1

This could simply be because there's nothing going on in the genome, i.e. the copy-number profile is really flat. When that happens, it's not possible to call CN states. So, I suggest to first plot the copy-number profile to see what it looks like.

hannanw commented 3 years ago

Hi Henrik,

Ahh okay, yea I understand that it will not work all the time. In cases such as this (when the copy-number profile is really flat) are the values reported in the c1Mean, c2Mean, ABCall, and lohCall still reliable, and can they be used for downstream analysis? I have also attached the plot below and as you suspected the copy-number profile does look rather flat.

Also, just out of curiosity do you have any thoughts/suggestions on how I can estimate the tumor purity and ploidy for my samples from the PSCBS segmentation output? Thanks!

issue1

HenrikBengtsson commented 2 years ago

Also, just out of curiosity do you have any thoughts/suggestions on how I can estimate the tumor purity and ploidy for my samples from the PSCBS segmentation output? Thanks!

You can use the following as a proxy for estimating the normal contamination (= 1 - tumor purity):

kappa <- estimateKappa(fit)

We have always been very conservative in claiming we can estimate normalization contamination, but we hint at it in Section 4.4.1 'Tuning parameters' of the 'Parent-specific copy-number segmentation using Paired PSCBS' vignette

Normal contamination is a well-understood and well-defined parameter, i.e. how much normal cells do we have in the mix of all cells. However, tumor ploidy is a rather poorly defined parameter and things like tumor clonality complicates it further as a confound factor. Because of this we never got to the point of making a serious attempt in estimating the tumor ploidy. So, there is no function in PSCBS for estimating the ploidy. I think facets provides some estimates on it.