ericgoolsby / Rphylopars

Phylogenetic Comparative Tools for Missing Data and Within-Species Variation
28 stars 11 forks source link

phylopars, mvOU models (full matrix, and diagonal alpha) not working #54

Open NevMagi opened 1 year ago

NevMagi commented 1 year ago

Dear @ericgoolsby , dear Rphylopars community,

Although I have succesfully used phylopars to fit the Ornstein-Uhlenbeck model with alpha matrix before, it seems not to work anymore. Although I first thought it had something to do with the data structure, trying it again with the simulated data in the wiki gives the same error.

As a remark, I ran into the issue described at (https://github.com/ericgoolsby/Rphylopars/issues/21) as well. Not log-transforming the data seems to help in this regard! Maybe I am still missing something about the necessary input data structure.

But the consistent error I keep encountering when running:

p_mvOU <- phylopars(trait_data = phenotypes, tree = tree = "mvOU")

or

p_mvOU_diag <- phylopars(trait_data = phenotypes, tree = tree,
                         model = "mvOU",full_alpha = FALSE)

is:

Error in (function (trait_data, tree, model = "BM", pheno_error, phylo_correlated = TRUE,  : 
  unused arguments (name.check = function (phy, data, data.names = NULL) 
{
    if (is.null(data.names)) {
        if (is.vector(data)) {
            data.names <- names(data)
        } else {
            data.names <- rownames(data)
        }
    }
    t <- phy$tip.label
    r1 <- t[is.na(match(t, data.names))]
    r2 <- data.names[is.na(match(data.names, t))]
    r <- list(sort(r1), sort(r2))
    names(r) <- cbind("tree_not_data", "data_not_tree")
    class(r) <- "name.check"
    if (length(r1) == 0 && length(r2) == 0) {
        return("OK")
    } else {
        return(r)
    }
}, col_rename = c(FALSE, FALSE, FALSE, FALSE), row_rm = c(FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE), col_rm = c(FALSE, FALSE, FALSE, FALSE))

I figured it might have something to do with name.check, which is part of Geiger. I read that in Rphylopars version 0.3.9 (which I am now running), the dependency on Geiger was removed.(https://github.com/ericgoolsby/Rphylopars/blob/master/R/phylopars_update.R) in line 189. Could it be that running the devtools installation of Rphylopars does not include the part copied from Geiger?

Pardon me if this is an obvious question,

All the best, Neville

ericgoolsby commented 1 year ago

Oh, yikes. I've managed to get my devel branch behind the master branch somehow. My last update was a hasty fix to satisfy CRAN, and I seem to have gotten my branches mixed up.

Anyway, as quick solution to the error you're receiving, if you include the argument usezscores = FALSE in your phylopars() call, does that run successfully allow you to run with model = "mvOU"? (I'm not sure why exactly but it seems to avoid triggering the error).

I'll work on seeing what's up with the branches. I've been a little hesitant to edit the current code much because I've been focusing my efforts on a complete overhaul that uses a much more stable (and less confusing) algorithm from Bastide et al ("Efficient Bayesian inference of general Gaussian models on large phylogenetic trees." The Annals of Applied Statistics 15.2 (2021): 971-997). It should make the package much more numerically stable (avoiding #21 issues) and way simpler to update, and there I'll be able to add a few features like gradient checks. It's been difficult to cobble together enough time to make that happen but I'm hopeful it can be done before the end of the calendar year!

NevMagi commented 1 year ago

Oh, no! That explains, thanks for your very short-term response. Running usezscores = FALSE worked! I am now trying to figure out how to best deal with log-transforming my data, as this choice seems to influence relative AIC scores of the different evolutionary models I am fitting.

That sounds really good! Let me thank you for all the work keeping Rphylopars updated, and also continually improving!

diogoprov commented 8 months ago

Hi, I don't know if you already had a chance to update the package, but I'm still getting this #21 error. My phylogeny is pretty small , with only 13 species, but without any branch length = 0 (please, see below).

> B$edge.length
 [1] 0.05583675 0.04960969 0.05804405 0.06391603 0.10849922
 [6] 0.10468260 0.10468260 0.21318181 0.04631970 0.23077815
[11] 0.07129389 0.09276476 0.06671950 0.06671950 0.15948425
[16] 0.13292765 0.20221424 0.20221424 0.38475159 0.08755140
[21] 0.20661633 0.14642060 0.14642060 0.35303693

I have tried all the solutions presented in the #21 and here too, with usezscores = FALSE and pheno_error = FALSE, but nothing suffices. I have tried to force the tree to be ultrametric. My data appears to be right:

    species   BL   TAL    TL   ED   BH  MTH  TMH  DFH
1     alagoana 6.89 10.51 17.40 0.95 4.27 4.95 1.71 1.63
2    schubarti 6.60 11.90 18.50 0.90 3.90 5.10 1.80 1.40
3     lacrimae 6.96 13.39 18.49 0.99   NA 3.77   NA   NA
4 albopunctata 8.80 11.20 18.60 1.10   NA 6.40 2.20   NA
5  altomontana 9.84 13.45 22.98 0.91 5.92 6.97 2.83 2.56
6  mantiqueira 5.30  8.79 14.17 0.86 3.60 4.60 1.72 1.37

Any suggestions?

ericgoolsby commented 7 months ago

Hi there! The package update is still in progress, but it turned out to be way more of a challenge than I had anticipated. The update that's in development includes a revised EM algorithm and analytic gradient to make parameter optimization faster and more stable, but they both require some very challenging recursive formulas for the OU model, so progress has been quite slow. It's coming along, just not on the timeline I had hoped for.

Regarding the error you're getting, I think the large number of traits (relative to number of species) is the culprit. As the number of traits approaches the number of species, the maximum likelihood trait covariance matrix will approach singularity, so numerical optimization can fail. It's even worse for a multivariate OU problem, since the alpha (sometimes called "selection") matrix becomes even more unstable when the number of species is small relative to the number of traits. In any case, I don't think a multivariate model is going to work for your data. But you do have other options depending on the nature of your question (e.g. Bayesian phylogenetic regression with the brms package, or possibly using Rphylopars to fit the model on subset pairs of traits, etc). Sorry to not have a more positive answer, but hopefully that helps.

diogoprov commented 7 months ago

No problem at all! Thanks a lot anyway for the feedback. The n:p ratio is indeed not ideal for this data set and I fully agree that this is likely causing the error, as you describe. As our data set is indeed small in terms of species, I believe phylogeny won't play a large role in it anyway, so we'll use an imputation technique that disregards phylogeny.