HRGV / phyloFlash

phyloFlash - A pipeline to rapidly reconstruct the SSU rRNAs and explore phylogenetic composition of an illumina (meta)genomic dataset.
GNU General Public License v3.0
77 stars 25 forks source link

Conda install doesn't work #105

Closed aassie closed 4 years ago

aassie commented 4 years ago

Hello there,

Trying to install PhyloFlash through conda doesn't work. I have tried on my current machine macOS Mojave 10.14.6 and on a Catalina machine with a fresh install of conda and in both cases, the following happens:

Solving environment: failed with initial frozen solve. Retrying with flexible solve.
Solving environment: failed with repodata from current_repodata.json, will retry with next repodata source.
Collecting package metadata (repodata.json): done
Solving environment: failed with initial frozen solve. Retrying with flexible solve.

Any idea how to go around this?

kbseah commented 4 years ago

Hi Adrien,

Have you tried using strict channel priority, as described under section 2.1 here? The priority should be conda-forge, bioconda, then defaults. If you want to enforce strict channel priority for this install only, use conda install --strict-channel-priority [etc.].

Let me know if this helps.

Best regards, Brandon

aassie commented 4 years ago

Hej Brandon,

I did follow the installation instruction to no effects unfortunately

aassie commented 4 years ago

This being said, installation through GitHub works perfectly fine. But it is easier for me to ask students that are not familiar with command line to install the tool through conda ;)

aassie commented 4 years ago

Alright, I have created a phyloflash environment and now the installation works on my machine. Will test it again on the student machine and update the issue

Sorry about that

kbseah commented 4 years ago

Ok that's odd.. thanks for reporting. It is possible that updates to the conda solver are the cause of the problem. Which version of Conda are you using? Are your students also using Mac OS X? Or are they using Linux? Many packages release different builds for different OS's, and some incompatibility there may be causing the problem, too.

aassie commented 4 years ago

Yeah, the student is using a MacOs with Catalina on it, I have refrained so far to upgrade because I read that there are many issues with this release. (For example the switch from bash to zsh)

I'll get the full error report this afternoon (texas time) and update it here.

kbseah commented 4 years ago

Oh boy, I wasn't aware of the switch from bash to zsh.. that's gotta annoy a lot of people.

Alright, we'll wait for your update. Good to know that you can still get it working via direct download from Github. Another alternative is to install the dependencies (bbmap, spades, etc.) into a single Conda environment, then just get the phyloFlash release from Github. In that way you'll still have a reproducible environment to work in.

aassie commented 4 years ago

My student had a fresh installation of Conda on a MacOS Catalina 10.15.3 and trying to install phyloflash following the tutorial gave this error

Solving environment: failed with initial frozen solve. Retrying with flexible solve.
Solving environment: failed with repodata from current_repodata.json, will retry with next repodata source.
Collecting package metadata (repodata.json): done
Solving environment: failed with initial frozen solve. Retrying with flexible solve.
Solving environment: / 
Found conflicts! Looking for incompatible packages.
This can take several minutes.  Press CTRL-C to abort.
failed                                                                                                                                                                                               

UnsatisfiableError: The following specifications were found to be incompatible with each other:                                                                                                      

Package msgpack-python conflicts for:
phyloflash -> emirge -> biopython -> mmtf-python -> msgpack-python
Package python_abi conflicts for:
phyloflash -> emirge -> pysam -> bcftools=1.6 -> matplotlib -> matplotlib-base[version='>=2.2.5,<2.2.6.0a0'] -> python_abi=2.7[build=*_cp27m]
python=3.7 -> pip -> setuptools -> python_abi=3.7[build=*_cp37m]
Package mmtf-python conflicts for:
phyloflash -> emirge -> pysam -> mmtf-python
Package python conflicts for:
python=3.7
python=3.7
Package certifi conflicts for:
python=3.7 -> pip -> setuptools -> certifi[version='>=2016.09|>=2016.9.26']
phyloflash -> emirge -> python[version='>=2.7,<2.8.0a0'] -> pip -> setuptools -> certifi[version='>=2016.09|>=2016.9.26']
Package ca-certificates conflicts for:
phyloflash -> r-base=3.4.1 -> openssl[version='>=1.0.2o,<1.0.3a'] -> ca-certificates
python=3.7 -> openssl[version='>=1.0.2o,<1.0.3a'] -> ca-certificates
Package biopython conflicts for:
phyloflash -> emirge -> biopython
Package setuptools conflicts for:
phyloflash -> emirge -> python[version='>=2.7,<2.8.0a0'] -> pip -> setuptools
python=3.7 -> pip -> setuptools
Package python-dateutil conflicts for:
phyloflash -> emirge -> pysam -> bcftools=1.6 -> matplotlib -> python-dateutil
Note that strict channel priority may have removed packages required for satisfiability.

however creating a new environment with conda create --name phyloFlash and repeating the exact same command worked fine.

kbseah commented 4 years ago

Was your student trying conda install phyloflash into the base environment?

The immediate cause of the problem is that Emirge depends on python2, but conda defaults to python3 now.

If so, installing to base environment (or an existing environment) is the issue. In general I have found that installing anything into the base environment of conda tends to break things, if not immediately then after a while. In general it's better to create a new environment for each tool or set of tools that you need, rather than conda install them incrementally. From my understanding, the solver works once to resolve dependencies all at once when you create the environment, but when you install things subsequently it can run into conflicts because it's trying to update an already-existing solution. This problem keeps cropping up on forums, and they now mention it (in passing) in the conda documentation now: https://docs.conda.io/projects/conda/en/latest/user-guide/getting-started.html

You don't want to put programs into your base environment, though. Create separate environments to keep your programs isolated from each other.

aassie commented 4 years ago

Yeah I slowly get to the same point. It just get weird to call and un-call conda env in pipeline scripts afterward (>lazy bum)

kbseah commented 4 years ago

I totally understand what you mean, it is extra overhead...

Have you tried snakemake? It can take care of the conda environments for you, and also cluster job submission. Just have to keep an eye on disk usage because conda tends to eat up the gigabytes.

If things work ok now, could you close the issue? Can't do it because I don't have the permissions. If you want to talk about snakemake or conda just drop me an email!

epruesse commented 4 years ago

his problem keeps cropping up on forums, and they now mention it (in passing) in the conda documentation now: https://docs.conda.io/projects/conda/en/latest/user-guide/getting-started.html

I put a big "Edit on Github" banner on the bioconda.github.io pages - where would you have found this information? How should it be put? The issue itself is well known, it's just where to put it so that users either find it as soon as they run into or or ideally before they run into it but so they actually remember....

It just get weird to call and un-call conda env in pipeline scripts afterward (>lazy bum)

It's valid to call ~/miniconda3/env/PF/bin/phyloFlash.pl. Conda rewrites all the paths inside the env (which is why they can't be moved around) to achieve just this. You don't have to activate the env to call its binaries. At least not usually. (Check in $ENV/activtate.d for scripts setting paths like ... umh ... ARBHOME and such).

Have you tried snakemake?

Wanna try YMP? ;-)

kbseah commented 4 years ago

Thanks for the input, Elmar! I wasn't sure if one could call the program directly without activating the environment explicitly. Haven't looked at the bioconda docs for a while, will see if it can be clarified there.

Link to YMP for Adrien: https://ymp.readthedocs.io/en/latest/

aassie commented 4 years ago

Thank you both of you for the tips 👍

epruesse commented 4 years ago

I wasn't sure if one could call the program directly without activating the environment explicitly.

You can. Activating an env primarily changes PATH to make the $MINICONDA/envs/$ENV/bin folder available to you in the shell (plus PS1 for the prompt and some tracking variables). The files in the environment are all altered to be self-contained, that is the python inside a specific env will have it's own site_libraries and stay within that, with the same happening for every path in every binary and file. That's why scripts in conda packages shouldn't have a #!/bin/perl line, but either #!/path/to/my/perl or #!/usr/bin/env perl. That way, the right perl is launched, the one that has all the libs installed.

The rationale for doing this over setting LD_LIBRARY_PATH and the Perl/Python/R equivalents is that the environment variables stay "free" for other usage, e.g. for compute clusters using module to manage software components.