cbg-ethz / shorah

Repo for the software suite ShoRAH (Short Reads Assembly into Haplotypes)
GNU General Public License v3.0
39 stars 14 forks source link

Insertion and deletion on the same haplotype #80

Closed zmx21 closed 3 years ago

zmx21 commented 3 years ago

Hello,

I've noticed that in cases where a deletion follows an insertion, the deletion is incorporated (by returning "-") but the insertion sequence is not appended?

For example, in the following example shown in IGV, there's an insertion (GA) followed by a deletion (GG) on some reads: INDEL_Example

For these haplotypes, Shorah returned the following sequence: ggcat--ca

However, if I'm not mistaken, the actual sequence should be: gggacatca

Any advice would be greatly appreciated, maybe this should have been taken care of in pre-processing? Thanks!

DrYak commented 3 years ago

Currently ShoRAH runs only in the coordinate space of the reference genome, and doesn't support inserts (yet). It's saddly not a feature we plan to address in the immediate future.

As a workaround, I would suggest trying to remap the alignement to a different reference that would include the insert, so that these haplotype will show 'gggacatca' and the remaining will show as 'gg--catca'.

The package smallgenomeutilities also developed by colleagues here at the CBG-ETHZ, has a tool named convert_reference that can help you remap alignements to different references.

Tell me if this helps.

zmx21 commented 3 years ago

I see, thanks for the prompt reply and helpful suggestion! I'll try the workaround you mentioned.