chrisamiller / copyCat

a parallel R package for detecting copy-number alterations from short sequencing reads
Other
22 stars 10 forks source link

Error in { : task 1 failed - "undefined columns selected" #10

Closed maryawood closed 5 years ago

maryawood commented 5 years ago

Hello,

I'm trying to run copyCat for some WES sequencing samples with GRCh38. Here is my command (actual paths replaced with placeholders):

runPairedSampleAnalysis(annotationDirectory="/PATH/TO/ANNOTATION/DIRECTORY/",
                        outputDirectory="/PATH/TO/OUTPUT/DIRECTORY/",
                        normal="/PATH/TO/NORMAL/BAM_WINDOW",
                        tumor="/PATH/TO/TUMOR/BAM_WINDOW",
                        inputType="bins",
                        maxCores=2,
                        binSize=0, 
                        perLibrary=1,
                        perReadLength=1, 
                        verbose=TRUE,
                        gcWindowSize=100,
                        readLength=100,
                        minWidth=3, 
                        minMapability=0.6,
                        dumpBins=TRUE,
                        doGcCorrection=TRUE,
                        samtoolsFileFormat="unknown", 
                        purity=1,
                        normalSamtoolsFile="/PATH/TO/NORMAL/PILEUP",
                        tumorSamtoolsFile="/PATH/TO/TUMOR/PILEUP")

The annotation directory contains the entrypoints.female, entrypoints.male, and gaps.bed files, as well as the readlength.100 directory. My samples have reads ~100 bp in length.

When I run the command, I get the following as output:

[1] "inferred bin size:  1000"
[1] "WARNING: bins file wasn't created with per-library option, will not perform per-library correction"
[1] "WARNING: bins file wasn't created with per-read-length option, will not perform per-read-length correction"
[1] "calculating mapability content for read length 100 Tue Jul  9 09:43:21 2019"
Error in { : task 1 failed - "undefined columns selected"

Any idea what might be going wrong here?

Thanks,

Mary

chrisamiller commented 5 years ago

Sorry for the slow response! I'm not sure exactly which stage is failing, but it sounds like one of your inputs isn't quite right. Did you generate your own mapability files or did you use the pre-built ones? Can you paste the head of your tumor and normal bins file? One of those might offer some hints!

maryawood commented 5 years ago

Thanks for getting back to me! Here is the head of the normal bins file:

Chr Start   Counts
chr1    1   0
chr1    1001    0
chr1    2001    0
chr1    3001    0
chr1    4001    0
chr1    5001    0
chr1    6001    0
chr1    7001    0
chr1    8001    0

And the tumor one:

Chr Start   Counts
chr1    1   0
chr1    1001    0
chr1    2001    0
chr1    3001    0
chr1    4001    0
chr1    5001    0
chr1    6001    0
chr1    7001    0
chr1    8001    0

I did check, and not all the values in the counts column are zero! Other bins have higher counts in both the tumor and normal files.

I used the pre-built mappability files.

chrisamiller commented 5 years ago

okay, those seem fine! Can you run traceback() immediately after the failure, so we can try to trace back the call that's causing the issue?

maryawood commented 5 years ago

Sure! Here is the result of calling traceback(), paths omitted:

> traceback()
5: stop(simpleError(msg, call = expr))
4: e$fun(obj, substitute(ex), parent.frame(), e$data)
3: foreach(chr = rdo2@entrypoints$chr, .combine = "combineBins", 
       .options.multicore = mcoptions) %dopar% {
       binMap(rdo2, chr, len)
   }
2: mapCorrect(rdo, skipCorrection = TRUE, minMapability = minMapability)
1: runPairedSampleAnalysis(annotationDirectory = "[PATH]", 
       outputDirectory = "[PATH]", 
       normal = "[PATH]", 
       tumor = "[PATH]", 
       inputType = "bins", maxCores = 2, binSize = 0, perLibrary = 1, 
       perReadLength = 1, verbose = TRUE, gcWindowSize = 100, readLength = 100, 
       minWidth = 3, minMapability = 0.6, dumpBins = TRUE, doGcCorrection = TRUE, 
       samtoolsFileFormat = "unknown", purity = 1, normalSamtoolsFile = "[PATH]", 
       tumorSamtoolsFile = "[PATH]")
chrisamiller commented 5 years ago

Is there a mismatch between "chr1" and "1" in your data and the annotation files? If so, you'll need to modify one of them to match

If that is not the issue, can you share your window files for normal and tumor? Those should not contain identifying information (though the pileups might)

maryawood commented 5 years ago

Looks like there was a mismatch! I adjusted my chromosome IDs in both the window files and pileup files to match the annotation, but I'm unfortunately still getting the same error.

It looks like the window files are too big for me to share here - do you have a preferred way for me to send them?

chrisamiller commented 5 years ago

another question before sharing files - do your window files contain chromosomes other than 1:22,X,Y? If so, that might cause the same issue

maryawood commented 5 years ago

Good point! The window and pileup files did both have other chromosomes, so I edited them to only contain 1:22,X,Y, but unfortunately I'm still getting the same error.

chrisamiller commented 5 years ago

Okay, I'd suggest gzipping the window files and posting them wherever is convenient for you (Dropbox share?). Let me know when they're up and I'll take a look!

maryawood commented 5 years ago

Looks like the window files once cut down to just the chromosomes 1:22,X,Y and gzipped are small enough! Here they are:

normal.bam_window.txt.gz tumor.bam_window.txt.gz

chrisamiller commented 5 years ago

Just to confirm - is this WGS data? The bins look very uneven - more like exome or some other targeted capture, with spikes of high coverage surrounded by low or zero coverage. CopyCat will not work on non-WGS, as it wasn't designed with that in mind (and I wouldn't be surprised if that's why it was crashing, though it should certainly output a more informative error!)

maryawood commented 5 years ago

You're right, this is WES data - sorry, I didn't realize that this was the case. Thank you for your time to address the problem!