Closed nicolas-bertin closed 8 months ago
In modifying the metric mean_insert_size
computing by using samtools stats
it appears that only marginal difference from removing duplicate reads of the 30x good quality data.
However this removing duplicate reads imply the need to run 2 times samtools stats
( with & without duplicates) for the metrics pct_reads_mapped
and pct_reads_properly_paired
heavy compute / marginal gain. Also, a concern about the poor quality data with higher number of duplicate reads!
summary of sugegstions:
samtools stats
instead of picard
reference implementation already compute samtools stats to gather other metrics
proposal to improve
mean_insert_size
metric definition fromhttps://github.com/ga4gh/quality-control-wgs/blob/e692682078a3f47b8160cc1ef74614227264b847/metrics_definitions/metrics_definitions.md?plain=1#L83-L88
to [Edit]
https://github.com/ga4gh/quality-control-wgs/blob/bddde198a26f1311a0188f8792b71d6fe704949d/metrics_definitions/metrics_definitions.md?plain=1#L59-L65
see #7