PeeperLab / CopywriteR

DNA copy number detection from off-target sequence data
GNU General Public License v3.0
28 stars 10 forks source link

CopywriteR crashes when "calculating extension cutoff for every peak" #17

Open messersc opened 6 years ago

messersc commented 6 years ago

We have found a few WES tumor/normal pairs where CopywriteR crashes. I don't quite understand why that happens, i.e. which input triggers this, but I tracked down the combinations of data that lead to this.

This is the error:

This analysis will be run on 8 cpus 
Error in while (cov.chr[index] > lower.cutoff.peaks[i]) { : 
  argument is of length zero
Calls: do.all ... tryCatch -> tryCatchList -> tryCatchOne -> <Anonymous>

This is the offending chunk of code (https://github.com/PeeperLab/CopywriteR/blob/master/R/CopywriteR.R#L447):

if (nrow(test) > 0) {
  for (i in seq_len(nrow(test))) {
    index <- test[i, "start"]
    while (cov.chr[index] > lower.cutoff.peaks[i]) {
      index = index - 1
    }
    test[i, "start"] <- index - read.length
    index <- test[i, "end"]
    while (cov.chr[index] > lower.cutoff.peaks[i]) {
      index = index + 1
    }
    test[i, "end"] <- index + read.length
  }
}

In the sample I am looking at, we start at test[i, "start"] = 154. The coverage at lower.cutoff.peaks[1] is 2. Now we move from position 154 down, until we find a position in cov.chr equal or smaller 2. If that does not happen, trying to get cov.chr[0] leads to the crash. I guess the function assumes that at least at cov.chr[1] the while loop should exit, but I don't know why this does not happen. In any case, this should be caught and dealt with.

I modified the code to print print(paste(cov.chr[index], lower.cutoff.peaks[i])), this is the output:

[1] "8 2"
[1] "7 2"
[1] "7 2"
[1] "7 2"
[1] "7 2"
[1] "7 2"
[1] "7 2"
[1] "7 2"
[1] "7 2"
[1] "7 2"
[1] "7 2"
[1] "7 2"
[1] "7 2"
[1] "7 2"
[1] "7 2"
[1] "7 2"
[1] "7 2"
[1] "7 2"
[1] "7 2"
[1] "6 2"
[1] "6 2"
[1] "6 2"
[1] "6 2"
[1] "6 2"
[1] "6 2"
[1] "6 2"
[1] "6 2"
[1] "6 2"
[1] "6 2"
[1] "6 2"
[1] "6 2"
[1] "6 2"
[1] "6 2"
[1] "6 2"
[1] "6 2"
[1] "6 2"
[1] "5 2"
[1] "5 2"
[1] "5 2"
[1] "5 2"
[1] "5 2"
[1] "5 2"
[1] "5 2"
[1] "5 2"
[1] "4 2"
[1] "4 2"
[1] "4 2"
[1] "4 2"
[1] "4 2"
[1] "4 2"
[1] "4 2"
[1] "5 2"
[1] "5 2"
[1] "5 2"
[1] "5 2"
[1] "5 2"
[1] "5 2"
[1] "3 2"
[1] "3 2"
[1] "3 2"
[1] "3 2"
[1] "3 2"
[1] "4 2"
[1] "4 2"
[1] "4 2"
[1] "4 2"
[1] "5 2"
[1] "5 2"
[1] "5 2"
[1] "5 2"
[1] "5 2"
[1] "5 2"
[1] "5 2"
[1] "5 2"
[1] "5 2"
[1] "5 2"
[1] "5 2"
[1] "5 2"
[1] "5 2"
[1] "5 2"
[1] "5 2"
[1] "5 2"
[1] "5 2"
[1] "5 2"
[1] "5 2"
[1] "5 2"
[1] "5 2"
[1] "5 2"
[1] "5 2"
[1] "5 2"
[1] "5 2"
[1] "4 2"
[1] "4 2"
[1] "4 2"
[1] "4 2"
[1] "4 2"
[1] "4 2"
[1] "4 2"
[1] "4 2"
[1] "4 2"
[1] "4 2"
[1] "4 2"
[1] "4 2"
[1] "5 2"
[1] "5 2"
[1] "5 2"
[1] "5 2"
[1] "5 2"
[1] "5 2"
[1] "5 2"
[1] "5 2"
[1] "5 2"
[1] "5 2"
[1] "5 2"
[1] "5 2"
[1] "5 2"
[1] "5 2"
[1] "5 2"
[1] "5 2"
[1] "5 2"
[1] "5 2"
[1] "5 2"
[1] "5 2"
[1] "5 2"
[1] "5 2"
[1] "5 2"
[1] "5 2"
[1] "5 2"
[1] "5 2"
[1] "5 2"
[1] "5 2"
[1] "5 2"
[1] "5 2"
[1] "5 2"
[1] "5 2"
[1] "5 2"
[1] "5 2"
[1] "5 2"
[1] "5 2"
[1] "5 2"
[1] "5 2"
[1] "5 2"
[1] "5 2"
[1] "5 2"
[1] "5 2"
[1] "5 2"
[1] "5 2"
[1] "5 2"
[1] "5 2"
[1] "5 2"
[1] "5 2"
[1] "3 2"
[1] "3 2"
[1] "3 2"

Best, Clemens

thomasKuilman commented 6 years ago

Hi Clemens,

This might indeed be a serious mistake in the code, although we have never heard about this potential bug. Would it be possible to run one of your sample pairs that shows the error, up to line 426, and then save it as an image using save.image()? If you could then send me the image, that would be a great way to start debugging. Also, just to be sure, could you provide me with a sessionInfo() and the log file?

Thanks and best regards,

Thomas


Thomas Kuilman, PhD Department of Molecular Oncology & Immunology Netherlands Cancer Institute 1066 CX Amsterdam The Netherlands

Phone: +31-20-5121841

On 24 Aug 2017, at 16:47, Clemens Messerschmidt notifications@github.com<mailto:notifications@github.com> wrote:

We have found a few WES tumor/normal pairs where CopywriteR crashes. I don't quite understand why that happens, i.e. which input triggers this, but I tracked down the combinations of data that lead to this.

This is the error:

This analysis will be run on 8 cpus Error in while (cov.chr[index] > lower.cutoff.peaks[i]) { : argument is of length zero Calls: do.all ... tryCatch -> tryCatchList -> tryCatchOne ->

This is the offending chunk of code (https://github.com/PeeperLab/CopywriteR/blob/master/R/CopywriteR.R#L447):

if (nrow(test) > 0) { for (i in seq_len(nrow(test))) { index <- test[i, "start"] while (cov.chr[index] > lower.cutoff.peaks[i]) { index = index - 1 } test[i, "start"] <- index - read.length index <- test[i, "end"] while (cov.chr[index] > lower.cutoff.peaks[i]) { index = index + 1 } test[i, "end"] <- index + read.length } }

In the sample I am looking at, we start at test[i, "start"] = 154. The coverage at lower.cutoff.peaks[1] is 2. Now we move from position 154 down, until we find a position in cov.chr equal or smaller 2. If that does not happen, trying to get cov.chr[0] leads to the crash. I guess the function assumes that at least at cov.chr[1] the while loop should exit, but I don't know why this does not happen. In any case, this should be caught and dealt with.

I modified the code to print print(paste(cov.chr[index], lower.cutoff.peaks[i])), this is the output:

[1] "8 2" [1] "7 2" [1] "7 2" [1] "7 2" [1] "7 2" [1] "7 2" [1] "7 2" [1] "7 2" [1] "7 2" [1] "7 2" [1] "7 2" [1] "7 2" [1] "7 2" [1] "7 2" [1] "7 2" [1] "7 2" [1] "7 2" [1] "7 2" [1] "7 2" [1] "6 2" [1] "6 2" [1] "6 2" [1] "6 2" [1] "6 2" [1] "6 2" [1] "6 2" [1] "6 2" [1] "6 2" [1] "6 2" [1] "6 2" [1] "6 2" [1] "6 2" [1] "6 2" [1] "6 2" [1] "6 2" [1] "6 2" [1] "5 2" [1] "5 2" [1] "5 2" [1] "5 2" [1] "5 2" [1] "5 2" [1] "5 2" [1] "5 2" [1] "4 2" [1] "4 2" [1] "4 2" [1] "4 2" [1] "4 2" [1] "4 2" [1] "4 2" [1] "5 2" [1] "5 2" [1] "5 2" [1] "5 2" [1] "5 2" [1] "5 2" [1] "3 2" [1] "3 2" [1] "3 2" [1] "3 2" [1] "3 2" [1] "4 2" [1] "4 2" [1] "4 2" [1] "4 2" [1] "5 2" [1] "5 2" [1] "5 2" [1] "5 2" [1] "5 2" [1] "5 2" [1] "5 2" [1] "5 2" [1] "5 2" [1] "5 2" [1] "5 2" [1] "5 2" [1] "5 2" [1] "5 2" [1] "5 2" [1] "5 2" [1] "5 2" [1] "5 2" [1] "5 2" [1] "5 2" [1] "5 2" [1] "5 2" [1] "5 2" [1] "5 2" [1] "5 2" [1] "4 2" [1] "4 2" [1] "4 2" [1] "4 2" [1] "4 2" [1] "4 2" [1] "4 2" [1] "4 2" [1] "4 2" [1] "4 2" [1] "4 2" [1] "4 2" [1] "5 2" [1] "5 2" [1] "5 2" [1] "5 2" [1] "5 2" [1] "5 2" [1] "5 2" [1] "5 2" [1] "5 2" [1] "5 2" [1] "5 2" [1] "5 2" [1] "5 2" [1] "5 2" [1] "5 2" [1] "5 2" [1] "5 2" [1] "5 2" [1] "5 2" [1] "5 2" [1] "5 2" [1] "5 2" [1] "5 2" [1] "5 2" [1] "5 2" [1] "5 2" [1] "5 2" [1] "5 2" [1] "5 2" [1] "5 2" [1] "5 2" [1] "5 2" [1] "5 2" [1] "5 2" [1] "5 2" [1] "5 2" [1] "5 2" [1] "5 2" [1] "5 2" [1] "5 2" [1] "5 2" [1] "5 2" [1] "5 2" [1] "5 2" [1] "5 2" [1] "5 2" [1] "5 2" [1] "5 2" [1] "3 2" [1] "3 2" [1] "3 2"

Best, Clemens

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHubhttps://github.com/PeeperLab/CopywriteR/issues/17, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AIGKlh4HMtLJ1p4xn-g35z_eTkOpHLdmks5sbYzwgaJpZM4PBfAr.

messersc commented 6 years ago

Hi Thomas,

I would be happy to provide you with more information to debug this. First, here is the full log log.txt and the sessionInfo()

R version 3.3.2 (2016-10-31)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: CentOS Linux 7 (Core)

locale:
[1] C

attached base packages:
[1] stats4    parallel  methods   stats     graphics  grDevices utils    
[8] datasets  base     

other attached packages:
 [1] CGHcall_2.36.0                         
 [2] snowfall_1.84-6.1                      
 [3] snow_0.4-2                             
 [4] CGHbase_1.34.0                         
 [5] marray_1.52.0                          
 [6] limma_3.30.13                          
 [7] impute_1.48.0                          
 [8] CopywriteR_2.0.6                       
 [9] BiocParallel_1.8.2                     
[10] DNAcopy_1.48.0                         
[11] Rsamtools_1.26.2                       
[12] Biostrings_2.42.1                      
[13] XVector_0.14.1                         
[14] matrixStats_0.52.2                     
[15] assertthat_0.2.0                       
[16] dplyr_0.7.2                            
[17] org.Hs.eg.db_3.4.0                     
[18] TxDb.Hsapiens.UCSC.hg19.knownGene_3.2.2
[19] GenomicFeatures_1.26.4                 
[20] AnnotationDbi_1.36.2                   
[21] Biobase_2.34.0                         
[22] GenomicRanges_1.26.4                   
[23] GenomeInfoDb_1.10.3                    
[24] IRanges_2.8.2                          
[25] S4Vectors_0.12.2                       
[26] BiocGenerics_0.20.0                    

loaded via a namespace (and not attached):
 [1] SummarizedExperiment_1.4.0 gtools_3.5.0              
 [3] lattice_0.20-31            rtracklayer_1.34.2        
 [5] blob_1.1.0                 XML_3.98-1.9              
 [7] rlang_0.1.2                glue_1.1.1                
 [9] DBI_0.7                    bit64_0.9-7               
[11] RColorBrewer_1.1-2         lambda.r_1.1.9            
[13] bindrcpp_0.2               bindr_0.1                 
[15] CopyhelpeR_1.6.0           zlibbioc_1.20.0           
[17] futile.logger_1.4.3        hwriter_1.3.2             
[19] memoise_1.1.0              chipseq_1.24.0            
[21] latticeExtra_0.6-28        biomaRt_2.30.0            
[23] Rcpp_0.12.12               ShortRead_1.32.1          
[25] bit_1.1-12                 digest_0.6.12             
[27] grid_3.3.2                 tools_3.3.2               
[29] bitops_1.0-6               magrittr_1.5              
[31] RCurl_1.95-4.8             tibble_1.3.3              
[33] RSQLite_2.0                futile.options_1.0.0      
[35] pkgconfig_2.0.1            Matrix_1.2-10             
[37] data.table_1.10.4          R6_2.2.2                  
[39] GenomicAlignments_1.10.1  

Sorry, I should have added this in the beginning.

If I do save.image at L426 I don't see any objects relevant to the question, but this is probably because we use a wrapper function and I'm saving the wrong environment... I have separately saved

cov.chr.RData
lower.cutoff.peaks.RData
read.length.RData
test.RData

though. If those are sufficient to research this, I would send those via email to you. Otherwise please tell me which other objects would be needed.

Thanks for your help! Clemens

Raey2014 commented 6 years ago

Hi Clemens,

Did you solve the problem? I have the same error in running CopywriteR in WES data.

messersc commented 6 years ago

Hello Raey,

unfortunately I was not able to solve this problem up to now.

My suspicion at the moment is that we are using an incorrect bed file for the WES enrichment kit (the data are pretty old, were sequenced somewhere else and metadata is sparse).

@thomasKuilman Do you think that this could be a reason for this crash?