cschin / Peregrine

Peregrine: Fast Genome Assembler Using SHIMMER Index
Other
99 stars 9 forks source link

Accuracy column in preads.ovl #21

Closed mrvollger closed 4 years ago

mrvollger commented 4 years ago

Hi Jason,

Is the 4th column of preads.ovl an estimated identity, or the actual overlap identity?

Thanks! Mitchell

cschin commented 4 years ago

It should be considered as an estimated accuracy.

mrvollger commented 4 years ago

I see. Does PG ever calculate DP overlaps before the assembly stage or is it based on SHIMMERs and estimated overlaps only until consensus step?

cschin commented 4 years ago

@mrvollger it does do alignment/overlap confirmation, It uses Gene Myers O(ND) alignment algorithm. The complexy is O(ND), N ~=sequence lengths, D=differences between the twe sequences. It is fast, but the difference is not the same as DP based methods. That is why I say it is “estimated”.

mrvollger commented 4 years ago

I see, thanks for the clarification!