broadinstitute / pilon

Pilon is an automated genome assembly improvement and variant detection tool
GNU General Public License v2.0
337 stars 60 forks source link

Question: Are redundant bases inside overlapping ends of paired reads counted once or twice towards consensus? #152

Open JohnUrban opened 1 year ago

JohnUrban commented 1 year ago

Hi all,

Thanks for the great program. I've been using it for years now.

I may have known this at one point or another, but definitely don't know it now. The title says it all:

Seems like one would only want to count each fragment once, even if it has two reads over a base.... this way certain fragments don't get weighted more heavily than others when trying to compute a consensus from a population of molecules.

I have 2 x 300 bp paired-end reads from MiSeq. My collaborators tried to do this on 600 bp fragments, but the overwhelming majority of fragments are ~300-350 bp, meaning the paired reads are nearly 100% redundant in most cases.

I suppose a related question would be:

I will also post this to the pilon-users mailing list. If I get an answer from one, I will post the answer to the other as well to close both out.

Best,

John