Open steigeec opened 1 year ago
Hi, thank you for using GADMA!
Inferencing the demographic history for three populations can be time-consuming. Do you have a total of 192 diploid individuals? If so, then it will be time-consuming indeed. You can check the processing speed in any of the following files: output_dir/N/eval_file
, where N
is the run number. Each line in this file corresponds to one evaluation of log-likelihood, which can give you an idea of the processing speed.
You have two options (as you have already mentioned):
eval_file
, you can estimate how long it will take to evaluate 300 or 400 log-likelihoods, as this is the recommended number of evaluations required for Bayesian optimization. Since you've been running GADMA for a month and have no models yet, it might still be quite slow.If you are using dadi, not moments, then you should use Bayesian optimization for three populations in almost any case. I can assist you in any option you choose.
Thank you, Ekaterina, for your very helpful response!
I have decided to both downsample the SFS and use Bayesian optimization. I just wanted to follow up with you as I have struggled getting optimization to work.
Manual GADMA installation has not worked for me in the past, just trying to get the correct combination of dependencies together. For my work with GADMA thus far, I have used a conda installation. However, my conda installation doesn't have access to the Bayesian optimization algorithm as currently installed:
ValueError: Optimizer 'SMAC_BO_combination' is not registered
I have started a fresh conda environment, installing the versions of those modules listed in minimal.txt and bayes_opt.txt, but I haven't found a way to install the required modules without incompatibilities arising. Do you have a recommended complete conda installation command which includes all necessary modules for running GADMA with the dadi engine, using Bayesian optimization?
Hi,
I am glad to hear from you. From my experience, it can be difficult to install specific versions of packages using conda - as I remember it force you to use the last version. However, if it suits you, you can install all required versions (bayes_opt.txt) using pip. Python should be able to use all packages that are installed either by conda or pip. However, make sure that the fresh versions you installed with conda are uninstalled, otherwise there can be a conflict. I hope that helps.
The last thing: what OS are you using? Just in case: SMAC that is required for Bayesian optimization is not working for Windows.
Best regards, Ekaterina
Hi, Ekaterina-
Thanks so much! I am on Ubuntu. I've definitely tried pip along the way. It seems that conda is the most promising path for installation on my system, though, given all the problems I've had with incompatibilities. I'm thinking that GADMA may have some current compatibility issues with numpy? If I install gadma using
mamba create -n DADI -c conda-forge -c bioconda -c anaconda python=3.8 --file gadma_reqs.txt
where gadma_reqs.txt is
setuptools_scm
numpy
scipy
matplotlib
matplotlib
Pillow
ruamel.yaml
mpmath
Cython
networkx
h5py
scikit-allel
pandas
dadi
scikit-optimize
configspace
scikit-learn
smac
then
mamba activate DADI
gadma --test
I get a numpy error that other installation approaches have also been giving me in these recent attempts:
AttributeError: module 'numpy' has no attribute 'bool'.
`np.bool` was a deprecated alias for the builtin `bool`. To avoid this error in existing code, use `bool` by itself. Doing this will not modify any behavior and is safe. If you specifically wanted the numpy scalar type, use `np.bool_` here.
The aliases was originally deprecated in NumPy 1.20; for more details and guidance see the original release note at:
https://numpy.org/devdocs/release/1.20.0-notes.html#deprecations
Thank you for being so responsive and helpful!!
My best, Emma
Hi, Ekaterina-
I just thought I'd follow up on this question!
Do you think that a numpy incompatibility issue with the current GADMA release indeed might be to blame here?
Thanks so much, Emma
Dear Emma,
Thank you for reminding me about your issue, sorry for slow reply. Yes, it is definitely numpy version that causes the error you see. I can recommend to uninstall numpy from conda and install it manually using pip and specific version. For example, I tested numpy 1.22.4 and it worked fine. To install specific version:
pip install numpy==1.22.4
I will also try to fix this error in the next release.
Best regards, Ekaterina
Hi, Ekaterina! -
Thanks so much for your suggestion. I have tried installation pathways through both pip and conda, and still am unable to complete the installation of GADMA with Bayesian optimization. In my previous installation of GADMA (before I was using Bayesian optimization), I really struggled with installation via pip, and eventually overcame the difficulty by relying entirely on conda. Now, layering on the requirements for Bayesian optimization, conda is also not delivering!
The continual problem that appears to come up is numpy. I have tried both specifying particular versions (and I've tried many possible versions), but also tried not specifying in the hopes that either mamba or conda will be able to detect and resolve incompatibilities during installation.
Something that would really be lovely would be to have a conda/mamba compatibility requirements file with all the exact versions required to play together nicely for GADMA run with the dadi engine, the Bayesian optimization algorithm, and moments (to allow visualization). This is what I've been attempting to compile as I try different combinations of module versions... I did initially try putting together the various requirement files from the GADMA github install, but numpy (as ever) causes problems.
I would be so grateful for any support you can provide! I have come back to this challenge every few weeks, hoping that I'd be newly inspired to solve this puzzle, but I have had no luck since my initial install woes in November!
My very best, Emma
Dear Emma,
I am sure that together we can solve this problem! Let us check the following things:
>>> import smac
>>> smac.__version__
What version of smac do you have?
pip uninstall numpy
and conda uninstall numpy
for as many times as required) and install it once using pip.I am sorry, I am not familiar with mamba, I usually use conda environments where it is allowed to install packages both using conda and pip. Is it also allowed in mamba?
I am looking forward to hear from you!
Best regards, Ekaterina
Hi, Ekatarina -
Thanks so much for your support. I can't wait to get things installed properly!
Mamba does let you install things using both it and pip! To follow up on your suggestions, what I first did was pip uninstall any versions of dependencies for GADMA currently on my system. I then tried specifying the versions of numpy and smac during a fresh mamba install, to make sure I end up with the versions we know should work (numpy=1.22.4, smac=0.13.1). Then, I downloaded GADMA with git and installed GADMA itself with the pip install .
command. Everything appeared to install correctly, but then would fail during the test command. I realized that when I pip installed GADMA, the versions of numpy and dadi I had specified were being overwritten.
Next, I tried setting up all my dependencies with conda again, but instead doing the conda install of gadma. In this instance, the install doesn't complete successfully:
warning libmamba Added empty dependency for problem type SOLVER_RULE_UPDATE
Could not solve for environment specs
The following packages are incompatible
└─ gadma is installable with the potential options
├─ gadma [2.0.0rc16|2.0.0rc17|2.0.0rc18] would require
│ └─ nlopt >=2.7.0,<2.7.1.0a0 , which does not exist (perhaps a missing channel);
├─ gadma 2.0.0rc18 would require
│ └─ python_abi 3.6.* *_cp36m, which does not exist (perhaps a missing channel);
├─ gadma 2.0.0rc18 would require
│ └─ python_abi 3.7.* *_cp37m, which does not exist (perhaps a missing channel);
└─ gadma [2.0.0|2.0.0rc19|...|2.0.0rc26] would require
└─ dadi with the potential options
├─ dadi 1.7.0 would require
│ └─ python >=2.7,<2.8.0a0 , which can be installed;
├─ dadi [2.0.3|2.0.4|2.0.5] would require
│ └─ python >=3.7,<3.8.0a0 , which can be installed;
└─ dadi [2.0.4|2.0.5] would require
└─ python >=3.6,<3.7.0a0 , which can be installed.
As I mentioned before, I had been using python 3.8 until this point. Now, I instead moved to python 3.7 again, and specified nlopt=2.7.0 in my list of requirements. Sadly, now I find that numpy and python start fighting:
warning libmamba Problem type not implemented SOLVER_RULE_STRICT_REPO_PRIORITY
warning libmamba Problem type not implemented SOLVER_RULE_STRICT_REPO_PRIORITY
warning libmamba Problem type not implemented SOLVER_RULE_STRICT_REPO_PRIORITY
warning libmamba Problem type not implemented SOLVER_RULE_STRICT_REPO_PRIORITY
Could not solve for environment specs
The following packages are incompatible
├─ numpy 1.22.4 is installable with the potential options
│ ├─ numpy 1.22.4 would require
│ │ └─ python_abi 3.10.* *_cp310, which can be installed;
│ ├─ numpy 1.22.4 would require
│ │ └─ python_abi 3.8 *_pypy38_pp73, which can be installed;
│ ├─ numpy 1.22.4 would require
│ │ └─ python_abi 3.8.* *_cp38, which can be installed;
│ ├─ numpy 1.22.4 would require
│ │ └─ python_abi 3.9.* *_cp39, which can be installed;
│ └─ numpy 1.22.4 would require
│ └─ python_abi 3.9 *_pypy39_pp73, which can be installed;
└─ python 3.7 is not installable because there are no viable options
├─ python 3.7.0 would require
│ └─ python_abi * *_cp37m, which conflicts with any installable versions previously reported;
└─ python 3.7.0 conflicts with any installable versions previously reported.
What I would love to try would be to specify the versions of all the dependencies in my conda install, copying exactly what we know works on your system! Might you share these dependency versions?
Dear Emma,
Wow, thank you for the details! Below, I provide the steps that I used just now to install and run GADMA on my laptop, I hope that will help you.
First, you said that versions of numpy and smac were overwritten after pip installation of GADMA. That usually means that pip does not see these packages to be installed for some reason. If you installed them using mamba, probably there is a solution how to tell pip about mamba installation directory.
Second, I would recommend to use Python 3.8 as it appears to be more reliable version.
Here my steps how I installed GADMA in local environment: 1) Create empty conda environment with Python 3.8:
conda create -n gadma_env python=3.8
2) Activate environment
conda activate gadma_env
3) Install specific versions (specific version of matplotlib is requiered for drawing with moments) and nlopt (for some reason there was an error during gadma installation):
pip install ruamel.yaml==0.16.12
pip install matplotlib==3.5.3
conda install nlopt
4) Install moments, e.g. using conda (works for Windows and Linux, but not for MacOS)
conda config --add channels bioconda
conda install moments
5) Clone GADMA repository and install it
git clone https://github.com/ctlab/GADMA.git
cd GADMA
pip install .
I did not try, but probably this should also work:
pip install gadma
6) After that, everything worked for me
gadma --test
Here is the output of my conda list
:
# packages in environment at /Users/noskovae/anaconda3/envs/gadma_env:
#
# Name Version Build Channel
attrs 23.2.0 pypi_0 pypi
blas 1.0 openblas
bzip2 1.0.8 h80987f9_5
ca-certificates 2024.3.11 hca03da5_0
contourpy 1.1.1 pypi_0 pypi
cycler 0.12.1 pypi_0 pypi
cython 3.0.9 pypi_0 pypi
dadi 2.3.3 pypi_0 pypi
demes 0.2.3 pypi_0 pypi
fonttools 4.50.0 pypi_0 pypi
gadma 2.0.1.dev7 pypi_0 pypi
importlib-resources 6.4.0 pypi_0 pypi
kiwisolver 1.4.5 pypi_0 pypi
libcxx 16.0.6 h4653b0c_0 conda-forge
libffi 3.4.4 hca03da5_0
libgfortran 5.0.0 11_3_0_hca03da5_28
libgfortran5 11.3.0 h009349e_28
libopenblas 0.3.21 h269037a_0
libsqlite 3.45.2 h091b4b1_0 conda-forge
libzlib 1.2.13 h53f4e23_5 conda-forge
llvm-openmp 14.0.6 hc6e5704_0
matplotlib 3.5.3 pypi_0 pypi
moments 1.1.15 pypi_0 pypi
mpmath 1.3.0 pypi_0 pypi
ncurses 6.4 h313beb8_0
nlopt 2.7.1 py38h6f14d55_4 conda-forge
numpy 1.24.3 py38h1398885_0
numpy-base 1.24.3 py38h90707a3_0
openssl 3.0.13 h1a28f6b_0
packaging 24.0 pypi_0 pypi
pandas 2.0.3 pypi_0 pypi
pillow 10.2.0 pypi_0 pypi
pip 23.3.1 py38hca03da5_0
pyparsing 3.1.2 pypi_0 pypi
python 3.8.16 h3ba56d0_1_cpython conda-forge
python-dateutil 2.9.0.post0 pypi_0 pypi
python_abi 3.8 4_cp38 conda-forge
pytz 2024.1 pypi_0 pypi
readline 8.2 h1a28f6b_0
ruamel-yaml 0.16.12 pypi_0 pypi
ruamel-yaml-clib 0.2.8 pypi_0 pypi
scipy 1.10.1 pypi_0 pypi
setuptools 68.2.2 py38hca03da5_0
six 1.16.0 pypi_0 pypi
sqlite 3.41.2 h80987f9_0
tk 8.6.12 hb8d0fd4_0
tzdata 2024.1 pypi_0 pypi
wheel 0.41.2 py38hca03da5_0
xz 5.4.6 h80987f9_0
zipp 3.18.1 pypi_0 pypi
zlib 1.2.13 h53f4e23_5 conda-forge
Best regards, Ekaterina
Hi, Ekaterina-
Thank you so much for your kindness and helpfulness in guiding me through this installation procedure!
With the information you provided and the support of an incredible labmate, I was finally able to complete this installation process. In case others are struggling, I want to share this solution here !!
conda env create -n DADI -f gadma_requirements.yaml
... where gadma_requirements.yaml is:
channels:
- conda-forge
- bioconda
dependencies:
- python=3.8
- pip
- setuptools_scm<=7.1.0
- numpy>=1.16.5,<1.23.0
- scipy>=0.6.0,<1.7.0
- matplotlib<=3.5.3
- Pillow>=4.2.1
- ruamel.yaml==0.16.12
- mpmath
- Cython
- networkx
- h5py
- scikit-allel
- pandas
- moments
- dadi=2.3.2
- demes
- demesdraw
- gadma
- scikit-optimize
- configspace
- pip:
- smac==0.13.1
This worked perfectly for him. I still have some issue with pip, so needed to run separately
pip install smac==0.13.1
Thanks, again, so much, Ekaterina! I am thrilled to have GADMA with BO running now!!
Hi @steigeec,
I am glad you have successfully overcame installation issue! Please let me know if you have any further questions.
Ekaterina
Hi, Ekaterina -
Thanks again for this incredibly useful program!
I had a great experience using for 2-population models. Ultimately, I want to implement for a 3-population model, but GADMA has not been able to print any models yet (192 individuals total, and 6.7million SNPs). To facilitate convergence, I changed my final structure to match the initial structure: [1,1,1], but GADMA has been running 30 processes for a month now and hasn't been able to print a model. Do you recommend that I implement the Bayesian optimization ensemble that is shown in the example of inference with four and five populations? Or perhaps I should consider downprojecting my data? Might you have other recommendations for how to assist in convergence?
Thanks so much!