emmanuelparadis / pegas

Population and Evolutionary Genetics Analysis System
GNU General Public License v2.0
27 stars 10 forks source link

haplotype network from the "haplotype.loci" class #4

Closed emmanuelparadis closed 7 years ago

emmanuelparadis commented 9 years ago

Suggestion from @jhmarcus: to create a haplotype network from the "haplotype.loci" class. It would be great to be able to use read.loci and / or read.vcf for doing this.

jhmarcus commented 9 years ago

Thank you @emmanuelparadis !

thibautjombart commented 8 years ago

Actually, converting any matrix of characters to haplotypes would be awesome. E.g.:

> x=matrix(letters[1:20],ncol=5)
> rownames(x) = paste("indiv",1:4)
> x
        [,1] [,2] [,3] [,4] [,5]
indiv 1 "a"  "e"  "i"  "m"  "q" 
indiv 2 "b"  "f"  "j"  "n"  "r" 
indiv 3 "c"  "g"  "k"  "o"  "s" 
indiv 4 "d"  "h"  "l"  "p"  "t" 
> apply(x,1,paste,collapse="") # haplotypes would be:
indiv 1 indiv 2 indiv 3 indiv 4 
"aeimq" "bfjnr" "cgkos" "dhlpt" 

Would be tremendously useful e.g. for antimicrobial resistance.

emmanuelparadis commented 8 years ago

haplotype() is already generic with methods for "DNAbin" and "loci". We just need to write one for "character" ;)

ONeillMB1 commented 8 years ago

Are there any updates on this? I would like to make a haplotype network from a haplotype.loci object (generated using haplotype() on a variable created using read.vcf()). I came across this post on stack overflow but the solution you suggested does not work for me either.

emmanuelparadis commented 7 years ago

News on this thread: there is now a haplotype.character() method in pegas. Using the toy data from @thibautjombart:

R> h <- haplotype(x)
R> h

Haplotypes extracted from: x 

    Number of haplotypes: 4 
         Sequence length: 5 

Haplotype labels and frequencies:

  I  II III  IV 
  1   1   1   1 

A simple distance function for this object could be:

dist.char <- function (x) 
{
    n <- nrow(x)
    if (n < 2) stop("less than two haplotypes")
    d <- numeric(n * (n - 1)/2)
    k <- 1L
    for (i in 1:(n - 1)) {
        for (j in (i + 1):n) {
            d[k] <- sum(x[i, ] != x[j, ])
            k <- k + 1L
        }
    }
    attr(d, "Size") <- n
    attr(d, "Labels") <- rownames(x)
    attr(d, "Diag") <- attr(d, "Upper") <- FALSE
    attr(d, "call") <- match.call()
    attr(d, "method") <- "N"
    class(d) <- "dist"
    d
}

giving:

R> dist.char(h)
    I II III
II  5       
III 5  5    
IV  5  5   5

Maybe need to be improved a bit... The distance matrix can then be used to build a MST with mst, or a MSN with msn, or a RMST with rmst. The last two are new functions documented with mst.

Check out also the new plotNetMDS for cool plotting with rgl.

emmanuelparadis commented 7 years ago

After prospecting a bit, there does not seem to be a general purpose Hamming distance function in R, so I renamed the above as dist.hamming and included it in pegas.

I remind that there is already dist.haplotype.loci for objects of class "loci" and dist.dna(x, "n") for those of class c("haplotype", "DNAbin").

... I guess this closes this thread