CCBR / Pipeliner

An open-source and scalable solution to NGS analysis powered by the NIH's Biowulf cluster.
4 stars 0 forks source link

Preseq lc_extrap: Too many defects in the approximation #405

Closed skchronicles closed 5 years ago

skchronicles commented 5 years ago

Preseq lc_extrapApproximation Error

Preseq lc_extrap fails when with extremely low read depth, and produces the following error message:

ERROR: too many defects in the approximation, consider running in defect mode.

From a maintainer of preseq:

    Preseq uses the observed duplicate count histogram to extrapolate
    the expected number of new distinct reads will be gained with additional 
    sequencing. This will not work if the duplicate count histogram is not 
    sufficiently full, which may be happening in the 2.5M read samples.

Runnning preseq in defect mode (specified by the -D) solves the problem. Alexei has tested the changes (using the -D flag) by running preseq lc_extrap with good and bad data. With good data resulting NRF values are unchanged, and with bad data preseq does not error out.