Closed naurasd closed 8 months ago
Hi @naurasd,
Here's a script simulating your analyses:
library(pegas)
data(woodmouse)
N <- 4L # number of data sets
DATA <- replicate(N, woodmouse[sample.int(15), ], FALSE)
H <- lapply(DATA, haplotype)
D <- lapply(DATA, dist.dna, model = "N")
R <- lapply(D, rmst, quiet = TRUE)
layout(matrix(1:N, 2, 2))
lapply(R, plot)
It works fine for me. Did you upgrade pegas? Version 1.3 was released recently. Cheers, Emmanuel
Hi @emmanuelparadis
I appreciate the quick reply, as always.
After checking again and again, I found the issue. When I am running rmst
, it actually works but fails at a certain point:
> nt <- lapply(d, rmst)
Iteration: 29 Number of links: 135
Iteration: 5 Number of links: 13
Less than 6 observations: doing complete enumeration.
Iteration: 6
Iteration: 4 Number of links: 5
Iteration: 6 Number of links: 16
Iteration: 6 Number of links: 9
Iteration: 5 Number of links: 6
Less than 6 observations: doing complete enumeration.
Iteration: 120
Iteration: 7 Number of links: 11
Less than 6 observations: doing complete enumeration.
Iteration: 120
Less than 6 observations: doing complete enumeration.
Iteration: 6
Iteration: 4 Number of links: 7
Less than 6 observations: doing complete enumeration.
Iteration: 6
Iteration: 6 Number of links: 16
Less than 6 observations: doing complete enumeration.
Iteration: 120
Iteration: 6 Number of links: 14
Iteration: 7 Number of links: 10
Iteration: 5 Number of links: 6
Less than 6 observations: doing complete enumeration.
Iteration: 6
Less than 6 observations: doing complete enumeration.
Iteration: 24
Iteration: 4 Number of links: 5
Iteration: 7 Number of links: 8
Less than 6 observations: doing complete enumeration.
Iteration: 1Error in apply(mat, 1, sort) : dim(X) must have a positive length
So rmst
is running properly, but it throws an error at the first instance where I only have two haplotypes, as is the case for MOTU1154:
>str(d)
List of 143
...
$ MOTU1143: 'dist' num [1:28] 3 4 3 4 5 15 14 1 2 1 ...
..- attr(*, "Size")= int 8
..- attr(*, "Labels")= chr [1:8] "I" "II" "III" "IV" ...
..- attr(*, "Upper")= logi FALSE
..- attr(*, "Diag")= logi FALSE
..- attr(*, "call")= language FUN(x = X[[i]], model = "N")
..- attr(*, "method")= chr "N"
$ MOTU1154: 'dist' num 12
..- attr(*, "Size")= int 2
..- attr(*, "Labels")= chr [1:2] "I" "II"
..- attr(*, "Upper")= logi FALSE
..- attr(*, "Diag")= logi FALSE
..- attr(*, "call")= language FUN(x = X[[i]], model = "N")
..- attr(*, "method")= chr "N"
As you can see, MOTU1154 only has two haplotypes, and therefore the dimensions get messed up. This is where rmst
fails. Does this mean I cannot handle scenarios where I only have two haplotypes with pegas?
Second small question: You are running dist.dna
on DATA
, not on the haplotypes in H
, which is how it is detailed in your documentation. Doesn't seem to be a problem? What's the difference, other than labels beeing different during plotting?
Merci Nauras
Great that you were able to diagnose this!
pegas can handle data with 2 haplotypes: mst()
does it and I'll fix rmst()
to return the MST when n=2.
In the meantime, you can add something like if (length(d) == 1)
THEN call mst()
ELSE call rmst()
. Maybe also add a check if n=1 (if needed) since mst()
would throw an error.
Second small question: You are running dist.dna on DATA, not on the haplotypes in H, which is how it is detailed in your documentation. Doesn't seem to be a problem?
My wrong: I meant D <- lapply(H, dist.dna, model = "N")
What's the difference, other than labels beeing different during plotting?
None with these data because each sequence of the woodmouse dataset defines (unambiguously) a distinct haplotype. Cheers, Emmanuel
Thanks, appreciate the hard work. Hope these small issues will help making the package more efficient!
let me know once the fix has been made, will download the new version then!
It's fixed and pushed here.
You are the best, Merci!
Hi,
I am trying to create a list of random minimum spanning trees for a list of distance matrices using the
rmst()
function.So I created a list of DNAbin objects called
seqs_list
. Then I extract the haplotypes for all the DNAbin objects using thehaplotype()
function and create a list of distance matrices calledd
for all the haplotypes objects. I then want to continue to apply thermst()
function to the list of distance matrices, but I fail with the error shown below:I tried to use a for loop just to try an alternative, but it gave the same error:
I know that this is not a
pegas
issue, but I am trying to understand what the issue is. Because when I access one specific distance matrix fromd
, then this works, so my distance matrices ind
are of the correct format:Do you have an idea why lapply doesn't work with
rmst()
but it works withhaplotype()
anddist.dna
?Thank you Nauras