chrisamiller / copyCat

a parallel R package for detecting copy-number alterations from short sequencing reads
Other
22 stars 10 forks source link

GC content bias correction problem #11

Closed leiendeckerlu closed 3 years ago

leiendeckerlu commented 4 years ago

Hi there,

I was running copyCat on a couple of paired T/N samples and while it runs perfectly fine for the majority of samples, I've identified a few samples where in a T/N pair the GC bias correction in the normal sample seems to be screwed (see example below).

Do you have an idea what the underlying cause here is?

I run copyCat like this:

runPairedSampleAnalysis(annotationDirectory="../annotations", outputDirectory="./output", normal="24N.regions.sub.clean3.bed", tumor="24T.regions.sub.clean3.bed", inputType="bins", maxCores=12, binSize=0, perLibrary=1, perReadLength=1, verbose=TRUE, minWidth=3, minMapability=0.6, dumpBins=TRUE, doGcCorrection=TRUE, samtoolsFileFormat="unknown", purity=1, normalSamtoolsFile=NULL, tumorSamtoolsFile=NULL)

normal.gccontent.lib1.readLength150.pdf tumor.gccontent.lib1.readLength150.pdf

Thank you, Lukas

chrisamiller commented 4 years ago

The mean number of reads in your normal sample appears to be very low, judging from that plot (compare to the tumor, where they have ~60-70x depth in each bin. Can you verify that the input data (both the bams and the windowed counts file) are sane?