epi2me-labs / wf-artic

ARTIC SARS-CoV-2 workflow and reporting
https://labs.epi2me.io/
Other
49 stars 36 forks source link

Downsampling algorithm over-downsamples at positions causing lower depth of coverage, and masking positions. #17

Closed DABAKER165 closed 1 year ago

DABAKER165 commented 2 years ago

For midnight, We have a sample that had at least 20 depth of coverage at every positions (when i changed the hardcoded script to not downsample). However, with the downsampling turned on we had a region of 20-26 depth of coverage across 75 positions that dropped to 16-19, and then is masked. Unlike Artic V3/4, midnight/rapid chops up the reads so each read does not get completely cover the primer region. However it appears, the downsampling will not take that into consideration, and as soon as a position between the primers reaches the hardcoded depth of coverage of 200 (or 200 reads map to the primer region) it will not consider anymore reads. Which, means these boarder line read coverage regions are going to be systematically masked when coverage is close to 20, when if the algorithm was allowed to continue, it would not have been masked. This is not an isolated event and occurred on several samples in a single ONT run. And since the primers amplify in a similar ratio, we will systematically mask certain regions more often.

DABAKER165 commented 2 years ago

I would propose a second criteria of stop downsampling after the maximum is reach and the minimum of 20 is reached across all positions of each region between each primer set. If the maximum is reached, and the read does not impact a position that is below 20 depth of coverage, then it can be removed. Otherwise it should continue to look for reads.

mattdmem commented 1 year ago

Thanks @DABAKER165, sorry for the very late reply. We're in the process of assessing our artic implementation and will consider this when we have a replacement.