emmanuelparadis / pegas

Population and Evolutionary Genetics Analysis System
GNU General Public License v2.0
27 stars 10 forks source link

[pegas 1.2] Median Joining network with wrong number of mutations ? #78

Closed ColineRoyaux closed 1 year ago

ColineRoyaux commented 1 year ago

Hi !

I'm trying to make a median joining network with my data and I noticed that the number of mutations is wrong for some cases in the network... For the DNA sequence input I converted a character list of my sequences into a table with each line being an haplotype and each column a single nucleotide of the sequence in the right order, then I used ape::as.DNAbin to convert the table into a DNA object named hapseq_DNA. Then, I run the following script:

mj_net <- pegas::mjn(hapseqDNA, prefix = "mv") Adding median vectors: 4 4 Building the network... number of clusters: 7 5 4 2 1 attr(mjnet, "labels") <- gsub("mv[0-9]+", "", labels(mj_net)) #Remove the names of median vectors for it to not display pegas::setHaploNetOptions(scale.ratio = 5) plot(mj_net, labels = TRUE, size = 7)

I obtain the following network (Median vectors with no labels) :

image

Everything seems fine but when I run the following line : ape::dist.dna(hapseq_DNA, "N") I obtain the following distance matrix :

image

I checked on different softwares and this distance matrix is right so the input sequences used to compute the mjn are right as well

Most of the network is right but for example, there is 18 mutations between h01 and h06 but the network displays 30 mutations between h01 and h06, there is 19 mutations between h01 and h03 but the network displays 21 mutations and there are other similar cases. Have you got an idea on the origin of my problem ?

Thank you so much !

emmanuelparadis commented 1 year ago

Hi, Often these networks have alternative paths so you expect some kind of differences as you observed. You can do cophenetic(mj_net) to compute the (shortest) distances on the network but be aware that these distances are calculated "roughly" (i.e., there is not an exhaustive search of all possible paths between all nodes of the network). You can also display all the links of the network by setting threshold = c(1, 1e4)) when plotting the network. You may also try to increase the value of epsilon in mjn to increase the search for median-vectors (see the examples in ?mjn). Best,

ColineRoyaux commented 1 year ago

Hi,

Thank you so much for your quick response !

Best