NorwegianVeterinaryInstitute / Talos

A shotgun metagenomic analysis pipeline using nextflow
BSD 3-Clause "New" or "Revised" License
1 stars 2 forks source link

Should I add removal of optical /artificial duplicates? #7

Closed Thomieh73 closed 4 years ago

Thomieh73 commented 4 years ago

removal of optical or artificial duplicates is often performed when doing whole genome sequencing. However, for metagenomics the literature is mixed. There are tools /workflows out there that remove duplicated reads, e.g. prinseq /YAMP. But there are also discussions that this is not so beneficial for the metagenomes, and one throws away data that could be used. For instance, exact duplicate can arrise when two DNA molecules from two bacterial cells of the same population are sequenced. This is especially likely in low-complexity samples, where it is easier to sample the whole community.

I should be able to test this with mock samples.

Thomieh73 commented 4 years ago

I checked this online.

pcr duplicates are created when the sequencing library is made using a pcr step. But with PCR-free libraries that we mostly use for metagenomics this should not be a big issue. So for now I will not add this step, but during quality control it will be good to see how much duplicates there actually are in the datasets, compared to the sequencing coverage of the metagenome.