etal / cnvkit

Copy number variant detection from targeted DNA sequencing
http://cnvkit.readthedocs.org
Other
545 stars 165 forks source link

Germ-line exomes, single samples #673

Open marchoeppner opened 2 years ago

marchoeppner commented 2 years ago

Hi, apologies for using this route, but I am quite stuck with my processing and cannot find a straight answer elsewhere.

What is the issue?

I have clinical exomes, typically single samples from patients across a range of diseases (routine human genetics diagnostics). For these samples, I would like to call putative CNVs using CNVkit. All exomes are sequenced on the same instrument, using the same wet-lab pipeline and exome kit (IDT xGen). Each sequencing run holds up to 48 (unrelated) samples and generates around 100X coverage per sample. Downstream processing consists of alignment (BWA) and duplicate marking against hg38 (without ALT contigs).

1) What is the "best" reference for this setup?

2) BED file The documentation is a bit unclear here. The option is called "target", but the text mentions "baits" a few times. In exome sequencing, these are two different things. "Baits" are the actual stretches used for constructing the RNA baits, whereas targets are usually the exons targetted for capture. One or more baits can map to one target (e.g. long exons). So is it baits or targets you expect here? Vedors supply BED files for both, usually.

3) Resolution The documentation mentions a resolution of >1MB for exomes. But the method "hmm-germline" apparently changes this? What is a realistic lower limit here?

4) Expected number of CNVs I have no idea if my results are anywhere near realistc. Depending on my exact approach, I get anything between a few dozen (reference from all samples in a sequencing run) to up to 1000 CNV calls (flat reference). What is a "typical" number for exomes from non-tumor samples?

Thanks for the help! /Marc