Haplotype network gaps - Githubissues

FischHa commented 1 month ago

Dear @emmanuelparadis,

thank you for maintaining the package pegas. I'm a vet and doctoral student and I'm currently working with NGS data obtained from parasites. The "problem" is, that the sequences include an intron region, resulting in a lot of gaps in the alignment. For our purpose it would be interesting to take those gaps into account when computing a haplotype network. Until now I haven't found a way to include the gaps - they are either ignored or removed. Is there a possibility - similar to the function gap_as_state() from phangorn - to change the gaps "-" into valuable information that is taken into consideration when calculating a haplotype network with pegas? Or do you have any other idea how to solve this problem?

I attached a subsample of my sequences - the gaps are rather large in some of the sequences (in the last one the intron is not present at all and therefore, the gap is very large). I hope you can help me with this issue and I'm looking forward to your response.

Best regards,

Hannah Subsample_NGS-data.txt

emmanuelparadis commented 3 weeks ago

Dear Hannah,

I had a quick look at your data. It's an interesting case because there seems to be a lot of "nested" indels and many substitutions (resulting in SNPs/MNPs) within them, so dropping the positions with at least one gap (which is the default in dist.dna()) wastes quite some information. I'm not sure how to best proceed in this situation. Perhaps a starting point would be to use DNAbin2indel() to find the unique indels. A similar question was asked some months ago related to MJN (see issue #87). I will have a closer look in the next few days.

As a side note, there seems to a small error in the (sub)alignment you sent: in "Sequence28", if you swap the G in position 146 with the two gaps that follow, the latter align with two gaps in other sequences. Of course, that depends on the other sequences in the full alignment.

Best,

Emmanuel

FischHa commented 3 weeks ago

Dear Emmanuels, thanks a lot for your fast response and your side note. I will have a look if there is a error in the larger alignment or if it is only due to the subsampling of the sequences in the alignment. I read the issue #87 and similar to Chatchamew I think it would be pretty revolutionary if it would be possible to include base deletion/insertion within haplotype()... ;-) I'm looking forward to your ideas. Best, Hannah

emmanuelparadis / pegas

Haplotype network gaps #92