Open rdinnager opened 6 years ago
Thanks @rdinnager for the issue!
I will update PSV later with c++, hopefully c++ will manage memory better. After then, I will test it with large phylogeny and see what do we need to handle such large trees.
Okay, that sounds like a good plan.
Hi @rdinnager , I updated psv
with c++. It is now faster than picante::psv
. But I am not sure whether it can handle several hundred thousand tips (probably not). The main bottleneck is the memory needed to store the species by species phylogenetic var-cov matrix for such many tips...
Hey @daijiang @rdinnager , I'd recommend big.memory
since it's pretty simple to interface with using Rcpp
(see here) and because it allows you to store matrices on disk. The latter is pretty important bc even a direct C++ implementation with no copying of such large matrices will deplete RAM on most computers. I've played around with it, and it seemed pretty intuitive.
Thanks @lucasnell . I will take a look at it later. Currently, the c++ version can handle 20k by 20k matrix on my laptop. It is probably enough for most ecological studies. Big.memory is definitely useful beyond this number.
A PSV/PSE/PSC/etc version that can handle really big phylogenies. In the past I have tried to calculate PSV on a phylogeny with several hundred thousand tips, but R will give a 'cannot allocate vector of size 150 GB', or some other ridiculously large value in this case (presumably because it is trying to allocate a huge phylogenetic covariance matrix). This data is not so unusual anymore, with large metagenomics data, so I think a memory efficient version would be really useful. I was think it could be done using the
bigmemory
andbigalgebra
packages?