Closed tanhuizhen closed 5 years ago
Are you trying to run EEMS on a Mac (as you are compiling it with make darwin
)? It will be much easier to install boost and eigen with homebrew. Can you try that?
Yes I am installing on a mac. I will try homebrew and update!
Dear Dr. Petkova,
Thank you for your advice to try homebrew, the installation of EEMS was successful.
After installing bed2diffs, I have ran into another problem: bed2diffs is unable to read my .bed .bim .fam files. I ran bed2diffs with this code:
/Users/fuzzywuzzy/Downloads/eems-master/bed2diffs/src-wout-openmp/bed2diffs_v1 --bfile /Volumes/HonoursTQ/HuiZhen/Nume_refmap_phaeopus_b15m5p2/b15m5p2Plink/b15m5p2Prune
and this error was returned:
Compute the average genetic differences according to: Dij = (1/|Mij|) sum_{m in Mij} (z_{im} - z_{jm})^2 where Mij is the set of SNPs where both i and j are called [Data::getsize] Error opening plink files /Volumes/HonoursTQ/HuiZhen/Nume_refmap_phaeopus_b15m5p2/b15m5p2Plink/b15m5p2Prune.[bed/bim/fam]
I have checked on confirmed that the files are in SNP-major mode and that these files are not damaged by using them in other analyses. The .bed .bim .fam files are all present in the directory provided. I have 16 individuals and around 7,000 SNPs, does the problem lie in me using src-wout-openmp
(and not src
)?
Thank you for your help.
Hello
bed2diffs
uses https://github.com/mfranberg/libplinkio to load the plink dataset. The error above occurs here:
pio_open( &plink_file, datapath.c_str() ) != PIO_OK
which I've taken directly from the libplinkio
documentation.
Can you check whether bed2diffs
runs on the small example in the test
folder? If yes, then it might be a version issue.
./src-wout-openmp/bed2diffs_v1 --bfile ./test/example-SNP-major-mode
Hi Dr Petkova,
Thank you for the advice.
I could run bed2diffs
on the test dataset. Which of the components may be the one with the version issue?
bed2diffs
uses libplinkio
https://github.com/mfranberg/libplinkio to load plink datasets.
The error above is raised by libplinkio
in this code snippet:
if( pio_open( &plink_file, datapath.c_str() ) != PIO_OK )
{
std::cerr << "[Data::getsize] Error opening plink files " << datapath << ".[bed/bim/fam]" << std::endl;
exit(1);
}
I am not sure why libplinkio raises the error.
Here is an alternative. Your dataset has 16 individuals and ~7,000 SNPs, so you should be able to load in memory. I have R function implementing bed2diffs_v1
and bed2diffs_v2
, which I've used for testing (bed2diffs_v1
uses a nested double loop, so will be slow). The following code snippet is a small but complete example.
It uses the MultiPhen R package https://cran.r-project.org/web/packages/MultiPhen/index.html to load the plink dataset.
# Defines MultiPhen::read.plink
library("MultiPhen")
# Use the "pairwise.complete.obs" method to compute pairwise dissimilarities
# This straightforward implementation
# uses a double loop, so would be slow if the sample size is large.
bed2diffs_v1 <- function(genotypes) {
nIndiv <- nrow(genotypes)
nSites <- ncol(genotypes)
diffs <- matrix(0, nIndiv, nIndiv)
for (i in seq(nIndiv - 1)) {
for (j in seq(i + 1, nIndiv)) {
x <- genotypes[i, ]
y <- genotypes[j, ]
diffs[i, j] <- mean((x - y)^2, na.rm = TRUE)
diffs[j, i] <- diffs[i, j]
}
}
diffs
}
# Compute the diffs matrix using the "mean allele frequency"
# imputation method
bed2diffs_v2 <- function(genotypes) {
nIndiv <- nrow(genotypes)
nSites <- ncol(genotypes)
missing <- is.na(genotypes)
## Impute NAs with the column means (= twice the allele frequencies)
geno_means <- colMeans(genotypes, na.rm = TRUE)
# nIndiv rows of genotype means
geno_means <- matrix(geno_means, nrow = nIndiv, ncol = nSites, byrow = TRUE)
## Set the means which correspond to observed genotypes to 0
geno_means[missing == FALSE] <- 0
## Set the missing genotypes to 0 (used to be NA)
genotypes[missing == TRUE] <- 0
genotypes <- genotypes + geno_means
similarities <- genotypes %*% t(genotypes) / nSites
self_similarities <- diag(similarities)
vector1s <- rep(1, nIndiv)
diffs <-
self_similarities %*% t(vector1s) +
vector1s %*% t(self_similarities) - 2 * similarities
diffs
}
plink_dataset <- "example-SNP-major-mode"
genotypes <- MultiPhen::read.plink(plink_dataset)
genotypes
plink_dataset <- "example-sample-major-mode"
genotypes <- MultiPhen::read.plink(plink_dataset)
genotypes
# MultiPhen::read.plink converts to sample-major format
bed2diffs_v1(genotypes)
bed2diffs_v2(genotypes)
I've had to install eems many times and here is the most painless way I found...
(1-a) install conda: https://conda.io/docs/ (2-a) conda install boost=1.5.7 (3-a) conda install eigen (4-a) make linux (or make darwin)
If you know a bit of the conda syntax, I recommend making an eems enviornment (1) conda create -n eems (2) source activate eems
Then you could follow the steps (1-a through 4-a) above to create the environment necessary to run eems.
Thank you @halasadi for the guide!
One little correction, step (2-a) should be conda install boost=1.57
To add on, you may find the required paths as:
EIGEN_INC = your_path_to/miniconda/envs/eems/include/eigen3
BOOST_LIB = your_path_to/miniconda/envs/eems/lib
BOOST_INC = your_path_to/miniconda/envs/eems/include
Put these lines in the EEMS/runeems_sats/src/Makefile
and/or EEMS/runeems_snps/src/Makefile
then make. Good luck!
Shujun
Thank you everyone for the helpful suggestions. It was eventually a version issue relating to eigen and boost. EEMS has been installed and ran.
Hi Dr. Petkova, I am trying to implement EEMS using Eigen 3.2.2 and Boost 1.57.
After installing Eigen and Boost, I face the error when trying to link them to the EEMS program (scroll to bottom for full error message):
I have seen similar questions and have tried various solutions: changing order of libraries/programs during compilation; change from libc++ to C++ Standard Library libstdc++ to build boost; adding -lboost_serialization, but none have worked. I appreciate your help and advice.
Installation steps:
This is where the error occurs - error message below:
Thank you!