etal / cnvkit

Copy number variant detection from targeted DNA sequencing
http://cnvkit.readthedocs.org
Other
547 stars 165 forks source link

how to accurately select samples from cnvkit.py metrics *.cnr -s *.cns #651

Open worker000000 opened 3 years ago

worker000000 commented 3 years ago

Thanks a lot I have many normal samples, and I use cnvkit.py metrics .cnr -s .cns to find the noisy sample, the cnvkit docs https://cnvkit.readthedocs.io/en/stable/reports.html#metrics said several markers to do, but is not that easy to do

some smaples with red arrow may should deleted because high segments, is there any other advice?

image

worker000000 commented 3 years ago

@etal @tetedange13 any comment on this, thanks a lot

etal commented 3 years ago

It can be helpful to plot each of these columns and look for outliers visually. In your case I'd recommend opening this table in a spreadsheet, sort by each of the columns individually, and look for any extreme values either numerically or by plotting. The average coverage depth of each sample, or number of reads in the BAM, is also a useful heuristic. If the same samples are being used for other 'omics analysis, it can be wise to use consistent sample acceptance criteria across analyses.

worker000000 commented 2 years ago

@etal thanks a lot, we often keep value in [mean - 3 sigma, mean + 3 sigma], can this also be applied to cnvkit filter samples in command metrics?