etal / cnvkit

Copy number variant detection from targeted DNA sequencing
http://cnvkit.readthedocs.org
Other
540 stars 164 forks source link

genemetrics does not output statistics #577

Open eriktoo opened 3 years ago

eriktoo commented 3 years ago

Edit: It appears this feature has not yet been implemented. I offer a possible workaround in the comments.

cnvkit version: 0.9.8

Genemetrics' help command lists a number of statistics which may be optionally selected using the appropriate flag:

Statistics available: --mean Mean log2-ratio (unweighted). --median Median. --mode Mode (i.e. peak density of log2 ratios). --ttest One-sample t-test of bin log2 ratios versus 0.0. --stdev Standard deviation. --sem Standard error of the mean. --mad Median absolute deviation (standardized). --mse Mean squared error. --iqr Inter-quartile range. --bivar Tukey's biweight midvariance. --ci Confidence interval (by bootstrap). --pi Prediction interval. -a ALPHA, --alpha ALPHA Level to estimate confidence and prediction intervals; use with --ci and --pi. [Default: 0.05] -b BOOTSTRAP, --bootstrap BOOTSTRAP Number of bootstrap iterations to estimate confidence interval; use with --ci. [Default: 100]

However when statistics are selected they are not included in the output. In commands.py the arguments are parsed, however they are not passed to reports.do_genemetrics(), which appears to be why the output table does not include the optional statistics.

eriktoo commented 3 years ago

I have what seems to be a viable workaround for the absence of stats in genemetrics. I am running genemetrics output into segmetrics, and from there into call. This enables running call with --ci or --sem filters, as well as the possibility of including tumor purity and BAF info if applicable, as it is in my case.

A word of caution on using both genemetrics and call, though, as both attempt to determine gender if not explicitly given to those commands which can result in transformations of the data. In my case I have a diploidX reference. If a sample is detected as male its X chromosome log2 ratios are increased by 1. The problem is that when genemetrics does this for males, call sees the transformed value and detects those samples as female. Consequently the wrong copy number is reported for X genes (e.g. 2 copies).

I counter this by running genemetrics with '--gender f' to avoid X shifting at that stage, but let call detect gender per usual. I've tested it on a small subset of my sample set and the genders are correctly handled.