etal / cnvkit

Copy number variant detection from targeted DNA sequencing
http://cnvkit.readthedocs.org
Other
560 stars 166 forks source link

cnvkit.py coverage does not honor -q option #913

Open ishida-md opened 2 months ago

ishida-md commented 2 months ago

Hi, I am running the latest cnvkit image on the docker hub. I get spurious calls where many reads with mapq = 0. I tried cnvkit.py coverage with the -q option, but it does not do anything. Is this a bug?

Here is the snippet: singularity exec -e --env OMP_NUM_THREADS=1 --bind /home cnvkit_0.9.11.sif cnvkit.py coverage -o mapq0.cnn test.bam xgen.t750.bed

singularity exec -e --env OMP_NUM_THREADS=1 --bind /home cnvkit_0.9.11.sif cnvkit.py coverage -o mapq60.cnn -q 60 test.markdup.bam xgen.t750.bed

mapq0.cnn and mapq60.cnn are identical although I see a lot of mapq = 0 reads on IGV.

etal commented 2 months ago

Could be bit rot. Some experimentation showed that even low-quality mappings provided a slightly better copy number signal than removing those reads, so filtering on quality isn't recommended anymore. But MAPQ of 0 is a special case implying unmapped or ambiguously mapped reads. I wouldn't recommend -q 60 (even if it worked) but -q 10 might be appropriate here.

  1. Do the results look better if you remove the MAPQ=0 read alignments from your BAM file before analyzing with CNVkit?
  2. Do you know the source of the MAPQ=0 read alignments? Are they mostly secondary mappings in repetitive parts of the genome?
ishida-md commented 2 months ago

Thank you for your response. I chose MAPQ=60 in the snippet as an extreme example. I have tried pre-filtering with the minimum MAPQ=20 and saw some differences.

  1. Pre-filtering BAMs yields slightly better-looking results in some samples.

unfiltered INF13-1

filtered INF13-1 mapq20

  1. Spurious calls from the unfiltered BAMs tend to come from MAPQ=0 reads that map to genes with paralogues. If I pre-filter the BAMs, these calls are gone.

chr1:13,446,216-13,450,648 (hg19) PRAMEF13 image

chr1:104,291,000-104,299,000 (hg19) AMY1C image

etal commented 2 months ago

Related: https://github.com/etal/cnvkit/pull/912

etal commented 2 months ago

Likely fixed in https://github.com/etal/cnvkit/pull/912. Can you try the latest development version and see if it works for you now?

ishida-md commented 2 months ago

Yes, cnvkit.py coverage -q is now working. Thank you! Could you consider adding -q option to cnvkit.py batch?