Closed maryawood closed 5 years ago
Sorry for the slow response! I'm not sure exactly which stage is failing, but it sounds like one of your inputs isn't quite right. Did you generate your own mapability files or did you use the pre-built ones? Can you paste the head of your tumor and normal bins file? One of those might offer some hints!
Thanks for getting back to me! Here is the head of the normal bins file:
Chr Start Counts
chr1 1 0
chr1 1001 0
chr1 2001 0
chr1 3001 0
chr1 4001 0
chr1 5001 0
chr1 6001 0
chr1 7001 0
chr1 8001 0
And the tumor one:
Chr Start Counts
chr1 1 0
chr1 1001 0
chr1 2001 0
chr1 3001 0
chr1 4001 0
chr1 5001 0
chr1 6001 0
chr1 7001 0
chr1 8001 0
I did check, and not all the values in the counts column are zero! Other bins have higher counts in both the tumor and normal files.
I used the pre-built mappability files.
okay, those seem fine! Can you run traceback()
immediately after the failure, so we can try to trace back the call that's causing the issue?
Sure! Here is the result of calling traceback()
, paths omitted:
> traceback()
5: stop(simpleError(msg, call = expr))
4: e$fun(obj, substitute(ex), parent.frame(), e$data)
3: foreach(chr = rdo2@entrypoints$chr, .combine = "combineBins",
.options.multicore = mcoptions) %dopar% {
binMap(rdo2, chr, len)
}
2: mapCorrect(rdo, skipCorrection = TRUE, minMapability = minMapability)
1: runPairedSampleAnalysis(annotationDirectory = "[PATH]",
outputDirectory = "[PATH]",
normal = "[PATH]",
tumor = "[PATH]",
inputType = "bins", maxCores = 2, binSize = 0, perLibrary = 1,
perReadLength = 1, verbose = TRUE, gcWindowSize = 100, readLength = 100,
minWidth = 3, minMapability = 0.6, dumpBins = TRUE, doGcCorrection = TRUE,
samtoolsFileFormat = "unknown", purity = 1, normalSamtoolsFile = "[PATH]",
tumorSamtoolsFile = "[PATH]")
Is there a mismatch between "chr1" and "1" in your data and the annotation files? If so, you'll need to modify one of them to match
If that is not the issue, can you share your window files for normal and tumor? Those should not contain identifying information (though the pileups might)
Looks like there was a mismatch! I adjusted my chromosome IDs in both the window files and pileup files to match the annotation, but I'm unfortunately still getting the same error.
It looks like the window files are too big for me to share here - do you have a preferred way for me to send them?
another question before sharing files - do your window files contain chromosomes other than 1:22,X,Y? If so, that might cause the same issue
Good point! The window and pileup files did both have other chromosomes, so I edited them to only contain 1:22,X,Y, but unfortunately I'm still getting the same error.
Okay, I'd suggest gzipping the window files and posting them wherever is convenient for you (Dropbox share?). Let me know when they're up and I'll take a look!
Looks like the window files once cut down to just the chromosomes 1:22,X,Y and gzipped are small enough! Here they are:
Just to confirm - is this WGS data? The bins look very uneven - more like exome or some other targeted capture, with spikes of high coverage surrounded by low or zero coverage. CopyCat will not work on non-WGS, as it wasn't designed with that in mind (and I wouldn't be surprised if that's why it was crashing, though it should certainly output a more informative error!)
You're right, this is WES data - sorry, I didn't realize that this was the case. Thank you for your time to address the problem!
Hello,
I'm trying to run copyCat for some WES sequencing samples with GRCh38. Here is my command (actual paths replaced with placeholders):
The annotation directory contains the
entrypoints.female
,entrypoints.male
, andgaps.bed
files, as well as thereadlength.100
directory. My samples have reads ~100 bp in length.When I run the command, I get the following as output:
Any idea what might be going wrong here?
Thanks,
Mary