Closed jaum20 closed 1 year ago
It's hard to answer your question since I don't know the algorithm used by DNAsp to extract haplotypes.
You may have a look at the function del.colgapsonly
in ape but it considers only gaps. You can a finer control of which sites to delete by applying the function base.freq
to each column of the data.
More simply, you can just export the haplotypes from DNAsp into a FASTA file.
Thank you for your answer.
More simply, you can just export the haplotypes from DNAsp into a FASTA file.
The export file options in DNAsp are nexus, arlequin and Roehl (network). How can I import the nexus file generated by DNAsp into R as an 'haplotype' object?
ape has the function read.nexus.data
, see its help page. So you may try something like:
library(pegas) # will load ape too
x <- read.nexus.data("myfilefromDNAsp.nex")
x <- nexus2DNAbin(x)
h <- haplotype(x)
h <- haplotype(x)
Unfortunately that not worked:
In haplotype.DNAbin(x) : no segregating site detected with these options
If you can share your data, I'll have a look
Sorry for the late response. Can you provide an e-mail address? I do not feel comfortable sharing a public link with my sequences
My adress is in the DESCRIPTION file as maintainer of the package :)
I just sent you an email with the data. Sorry for the delay
No worry for the delay :) I had a look at your data. Here's what you can do:
x <- read.dna("ITS_pruned.DnaSP.fas", "f")
image(x)
It shows that several positions (around 80-180) are polymorphic but some sequences have long stretches of gaps at these positions. So it makes sense that DNAsp misses these. Just to be clear on the way pegas::haplotype()
handles these N's which are inserted in place of the leading/trailing gaps, in the following alignment the 1st position is considered as polymorphic but not the 2nd one:
NN
GA
AA
(If you set the option strict = TRUE
both positions are treated as polymorphic). This generalizes to other ambiguous bases; in the next alignment, the 1st position is treated as polymorphic but not the 2nd one:
YR
GG
AA
If you want to reproduce the results from DNAsp, the function ape::del.colgapsonly
can be used:
haplotype(del.colgapsonly(x, threshold = 0.01))
which outputs 4 haplotypes (but considers 390 positions instead of the 858 in the oiriginal data set).
Thank you very much for your help and explanation 🤝
The same matrix (73 DNA sequences) produces very different results in the two programs. In DNAsp, 4 haplotypes are generated (gaps not considered and excluding sites with gaps / missing data) and with pegas I got 67 haplotypes!
I tried all sort of things: replace gaps with Ns, question marks, and all combinations of the strict and trailingGapsAsN arguments, all give me dozens of haplotypes but none gives me the 4 haplotypes wich works well for my data.
How can I exclude the sites with gaps and missing data to get the same result as DNAsp?