connor-lab / ncov2019-artic-nf

A Nextflow pipeline for running the ARTIC network's fieldbioinformatics tools (https://github.com/artic-network/fieldbioinformatics), with a focus on ncov2019
GNU Affero General Public License v3.0
89 stars 89 forks source link

Remove mpileup depth limit in consensus generation #51

Closed dkj closed 4 years ago

dkj commented 4 years ago

The current samtools depth reduction technique is not suitable for the high depth NovaSeq output together with end of amplicon (ligated/tailed) adapters.

daviesrob commented 4 years ago

I'll add some more information on why this is necessary. The samtools mpileup depth limit isn't very clever - it drops reads which would cause the depth limit to be exceeded even if the read is needed for a lower-depth region later. This causes a shadowing effect where the depth drops too far in regions adjacent to where the depth limit kicks in.

For amplicon data, where there are deep blocks starting and ending at well-defined locations with short overlaps, the result of applying the depth limit can be catastrophic. In the worst case, entire blocks of reads may be dropped leading to incorrect or missing calls. If the depth remains high, the effect can cascade along the entire genome leaving alternating regions of high and low coverage, as shown in the attached graph.

depth_limit_effect .

In a recent example we found, this converted a very high-depth C call:

Strand A C G T
Fwd 82 209731 13 3
Rev 1649 469 4 0

into an A:

Strand A C G T
Fwd 0 4 0 0
Rev 1649 469 4 0

As can be seen, almost all the forward reads were dropped leading to the miscall.