conda / conda-lock

Lightweight lockfile for conda environments
https://conda.github.io/conda-lock/
Other
490 stars 103 forks source link

Support pip packages and pip GH installs in a conda environment.yaml file list. #4

Closed ocefpaf closed 2 years ago

ocefpaf commented 4 years ago

If you have something like:

name: test
channels:
  - conda-forge
dependencies:
  - python=3.7
  - pip:
    - some-pkg
    - git+https://github.com/someuser/another-pkg.git@master

conda-lock will crash. We should either fail more gracefully or add support pip installs too.

marcelotrevisani commented 4 years ago

+1 for support of pip install --no-deps

mariusvniekerk commented 4 years ago

main issue with pip is lack of determinism. But we can probably work around that

ocefpaf commented 4 years ago

The install from master example above is just wrong IMO and I added it to discuss some sort of warning for conda-lock. For PyPI packages we can probably look into pip-tools for inspiration.

ocefpaf commented 4 years ago

+1 for support of pip install --no-deps

In a way that is what conda-env does, right? Not sure if it issues the --no-deps though.

I don't want to make this overly complicated, the first pass should be just a warning instead of a failure: "you have pip packages in your, this env won't pass a round trip and won't be fully reproducible. The pip packages are dropped from the lock file."

noahp commented 4 years ago

Pip can check hashes for downloaded files: https://pip.pypa.io/en/stable/reference/pip_install/#hash-checking-mode

ocefpaf commented 4 years ago

Pip can check hashes for downloaded files: pip.pypa.io/en/stable/reference/pip_install/#hash-checking-mode

I guess that the main challenge here is to make conda really understand what was installed by pip. I'm not even sure that an old bug, that made packages with an - in the name, be wrongly identified as pip packages even when installed with conda.

TL;DR this may be more difficult than it sounds b/c one would have to dig into conda itself.

@mariusvniekerk what do you say? Am I way off here?

mariusvniekerk commented 4 years ago

So we can put some of the pip packages as a magic comment.
This also does imply that conda-lock have an install mode so that it runs the appropriate pip installs since I'm pretty sure conda install isn't going to

sterlinm commented 4 years ago

One option that wouldn't quite achieve reproducibility on the pip end but still might be useful is to use conda-lock to update the conda dependencies in the environment.yml file with the locked conda packages while leaving the pip section of the dependencies alone. It wouldn't guarantee the reproducibility of the pip packages but those are always installed after the conda environment is solved, so it still might be an improvement for some people. I have a clunky script that's doing something like this.

from copy import deepcopy
from pathlib import Path
from conda_lock.conda_lock import solve_specs_for_arch, ensure_conda, fn_to_dist_name, search_for_md5s, run_lock
import yaml

def load_env_file(file_name):
    with open(file_name, 'r') as f:
        data = yaml.load(f, Loader=yaml.Loader)
    return data

def write_env_file(env_data, file_name):
    with open(file_name, 'w') as f:
        data = yaml.dump(env_data, stream=f, Dumper=yaml.Dumper)

def lock_conda_specs(conda_dependencies: list, channels: list) -> list:
    conda_path = ensure_conda()
    platform = 'linux-64'
    """Given a list of conda dependencies return a list of locked dependencies."""
    dry_run_install = solve_specs_for_arch(
        conda=conda_path,
        channels=channels,
        specs=conda_dependencies,
        platform=platform
    )

    link_actions = dry_run_install["actions"]["LINK"]
    if not dry_run_install['success']:
        raise RuntimeError('solve failed')
    for link in link_actions:
        link[
            "url_base"
        ] = f"{link['base_url']}/{link['platform']}/{link['dist_name']}"
        link["url"] = f"{link['url_base']}.tar.bz2"
        link["url_conda"] = f"{link['url_base']}.conda"
    link_dists = {link["dist_name"] for link in link_actions}

    fetch_actions = dry_run_install["actions"]["FETCH"]

    fetch_by_dist_name = {
        fn_to_dist_name(pkg["fn"]): pkg for pkg in fetch_actions
    }

    non_fetch_packages = link_dists - set(fetch_by_dist_name)
    if len(non_fetch_packages) > 0:
        for search_res in search_for_md5s(
            conda_path,
            [x for x in link_actions if x["dist_name"] in non_fetch_packages],
            platform,
        ):
            dist_name = fn_to_dist_name(search_res["fn"])
            fetch_by_dist_name[dist_name] = search_res

    pkgs = []
    for pkg in link_actions:
        url = fetch_by_dist_name[pkg["dist_name"]]["url"]
        md5 = fetch_by_dist_name[pkg["dist_name"]]["md5"]
        pkgs.append(f"{url}#{md5}")

    return pkgs

def lock_env_data(env_data):
    """Convert conda environment dependencies to locked specs using conda_lock.

    Args:
        env_data ([type]): [description]
    """
    # split dependencies into conda dependencies and pip dependencies
    deps = env_data['dependencies']
    conda_deps = [dep for dep in deps if isinstance(dep, str)]
    pip_deps = [dep for dep in deps if not isinstance(dep, str)]
    if len(pip_deps) > 1:
        raise ValueError("there is more than one dictionary in dependencies. Should be only pip")

    locked_conda_deps = lock_conda_specs(conda_deps, env_data['channels'])
    if pip_deps:
        locked_conda_deps.append(pip_deps[0])
    locked_env_data = deepcopy(env_data)
    locked_env_data['dependencies'] = locked_conda_deps
    return locked_env_data

def lock_env_file(env_file, locked_env_file):
    env_data = load_env_file(env_file)
    locked_env_data = lock_env_data(env_data)
    write_env_file(locked_env_data, locked_env_file)
    return locked_env_file

As an example, it converts this environment:

name: test
channels:
 - conda-forge
dependencies:
 - python=3.7
 - pandas=1.0.5
 - pip:
    - sidetable=0.7.0
prefix: /opt/conda/envs/test

To this:

channels:
- conda-forge
dependencies:
- https://conda.anaconda.org/conda-forge/linux-64/_libgcc_mutex-0.1-conda_forge.tar.bz2#d7c89558ba9fa0495403155b64376d81
- https://conda.anaconda.org/conda-forge/linux-64/ca-certificates-2020.6.20-hecda079_0.tar.bz2#1b1cca86e95c416a8e7eb6062af6d503
- https://conda.anaconda.org/conda-forge/linux-64/ld_impl_linux-64-2.34-hc38a660_9.tar.bz2#aa1e7603f8dd36f8d60026cda3f1fb2c
- https://conda.anaconda.org/conda-forge/linux-64/libgfortran-ng-7.5.0-hdf63c60_16.tar.bz2#d403b27c431064370f9d1b1962f8a86b
- https://conda.anaconda.org/conda-forge/linux-64/libstdcxx-ng-9.3.0-hdf63c60_16.tar.bz2#2c7c23cdad4f42f924d19029ef97475c
- https://conda.anaconda.org/conda-forge/linux-64/libgomp-9.3.0-h24d8f2e_16.tar.bz2#48f89ebfddb4ac93e74b0f4ab14c4a13
- https://conda.anaconda.org/conda-forge/linux-64/_openmp_mutex-4.5-1_gnu.tar.bz2#561e277319a41d4f24f5c05a9ef63c04
- https://conda.anaconda.org/conda-forge/linux-64/libgcc-ng-9.3.0-h24d8f2e_16.tar.bz2#846daf5c2a4dd387047cc5ccc6b9c613
- https://conda.anaconda.org/conda-forge/linux-64/libffi-3.2.1-he1b5a44_1007.tar.bz2#11389072d7d6036fd811c3d9460475cd
- https://conda.anaconda.org/conda-forge/linux-64/libopenblas-0.3.10-pthreads_hb3c22a3_4.tar.bz2#8e3914247353e97a184909dbee132bfb
- https://conda.anaconda.org/conda-forge/linux-64/ncurses-6.2-he1b5a44_1.tar.bz2#d3da4932f3d8e6b3c81fcf177d1e6eab
- https://conda.anaconda.org/conda-forge/linux-64/openssl-1.1.1g-h516909a_1.tar.bz2#6fdcd20ec22aeffa10b6102bccc47e7f
- https://conda.anaconda.org/conda-forge/linux-64/xz-5.2.5-h516909a_1.tar.bz2#33f601066901f3e1a85af3522a8113f9
- https://conda.anaconda.org/conda-forge/linux-64/zlib-1.2.11-h516909a_1009.tar.bz2#93486907c6757170a5125198506d9cf8
- https://conda.anaconda.org/conda-forge/linux-64/libblas-3.8.0-17_openblas.tar.bz2#fdd1790e564778bf0c616e639badfe58
- https://conda.anaconda.org/conda-forge/linux-64/readline-8.0-he28a2e2_2.tar.bz2#4d0ae8d473f863696088f76800ef9d38
- https://conda.anaconda.org/conda-forge/linux-64/tk-8.6.10-hed695b0_0.tar.bz2#9a3e126468fa7fb6a54caad41b5a2d45
- https://conda.anaconda.org/conda-forge/linux-64/libcblas-3.8.0-17_openblas.tar.bz2#28f6376d1c4ca5e0fc287fb0484e37a1
- https://conda.anaconda.org/conda-forge/linux-64/liblapack-3.8.0-17_openblas.tar.bz2#09cfbcdb4888dc9010b4cbc60e55c6ad
- https://conda.anaconda.org/conda-forge/linux-64/sqlite-3.33.0-h4cf870e_0.tar.bz2#b22603a9c94d2cda5911f7a2cd55aa95
- https://conda.anaconda.org/conda-forge/linux-64/python-3.7.8-h425cb1d_1_cpython.tar.bz2#3197fc7597f6d13d32350dd93e15f3e2
- https://conda.anaconda.org/conda-forge/linux-64/python_abi-3.7-1_cp37m.tar.bz2#658a5c3d766bfc6574480204b10a6f20
- https://conda.anaconda.org/conda-forge/noarch/pytz-2020.1-pyh9f0ad1d_0.tar.bz2#e52abc1f0fd70e05001c1ceb2696f625
- https://conda.anaconda.org/conda-forge/noarch/six-1.15.0-pyh9f0ad1d_0.tar.bz2#1eec421f0f1f39e579e44e4a5ce646a2
- https://conda.anaconda.org/conda-forge/linux-64/numpy-1.19.1-py37h7ea13bd_2.tar.bz2#f05213c1f8539d8ee086139df2b762c7
- https://conda.anaconda.org/conda-forge/noarch/python-dateutil-2.8.1-py_0.tar.bz2#0d0150ed9c2d25817f5324108d3f7571
- https://conda.anaconda.org/conda-forge/linux-64/pandas-1.0.5-py37h0da4684_0.tar.bz2#6fddaa88968614a9be807964f586e91c
- pip:
  - sidetable=0.7.0
name: test
prefix: /opt/conda/envs/test
mariusvniekerk commented 4 years ago

@sterlinm one relatively crude approach that we can take is to harvest the pip packages and embed it as a special comment in the lockfile that installers can use

nbren12 commented 3 years ago

It would be wonderful to make progress on this issue. Does the new pip resolver help? Also, pip-tools can create "lock" files as well.

mariusvniekerk commented 3 years ago

Using the new resolver / pip-tools might be feasible with some pretty aggressive hacks

  1. do the regular conda solve with an added pip
  2. determine which of those packages are in fact python packages (non-trivial)
  3. reverse name map those conda names to pypi names (similar to what we do for pyproject.toml etc)
  4. use those names + resolved version numbers + the pip specified packages to generate a requirements.IN
  5. solve that thing with pip-tools
  6. prune away all the things that conda provides
  7. add the pip packages + hashes to a magic comment somewhere in the lock

And for the installer

  1. Teach the installer about the magic comment block
nbren12 commented 3 years ago

@mariusvniekerk That seems like a feasible approach. We do something similar now, but don't actually ensure the transitive dependencies of the anaconda and pip packages are compatible.

Would it be possible reverse the order of the pip and conda resolution in your algorithm? One things we've noticed is that pip-tools does not work very well for some packages (e.g. cartopy) which require system libraries be installed before running the setup.py. Since we are using conda anyway, it would be nice to avoid running pip-compile on these tricky packages.

nbren12 commented 3 years ago

Also, this tool seems relevant. It somehow combines conda, pip, and nix packages, and has it's own dependency resolution approach: https://github.com/DavHau/mach-nix.

RafalSkolasinski commented 3 years ago

Just to chip in: support for pip packages is currently what stops us from exploring using the tool. We look for option to "lock" environments for ML model servings. Some packages that we need to include do not come as conda packages and need to be listed as pip deps and locking their version - together with their 2nd+ level deps is critical.

nbren12 commented 3 years ago

@RafalSkolasinski we have a similar problem, but we use conda lock for the anaconda dependencies and pip-tools for the pip packages. Of course, there could be some inconsistencies between the conda and pip lock files, but the setup is still deterministic, so it doesn't break randomly.

jli commented 3 years ago

I have the same issue as @RafalSkolasinski and @nbren12. I'm considering using @nbren12's approach of running the conda-lock and pip-compile tools independently, but I'm a bit nervous about incompatibilities from pip overwriting the conda dependency versions, and also this results in wasted space in Docker images. Still, good point that at least it's deterministic.

For posterity, I asked a StackOverflow question about this: https://stackoverflow.com/questions/68171629/how-do-i-pin-versioned-dependencies-in-python-when-using-both-conda-and-pip

jli commented 3 years ago

In case it helps others, I went with a heavier weight approach of installing the conda+pip dependencies in a temporary conda enviroment and then using conda env export to generate a lock file that includes both conda and pip packages.

I wrote up this approach here: https://gist.github.com/jli/b2d2d62ad44b7fcb5101502c08dca1ae

ocefpaf commented 3 years ago

@jli for a realized env like that you can use https://github.com/olegtarasov/conda-export

That "heavy weight" approach is problem the only way to solve this at the moment.

jli commented 3 years ago

@ocefpaf Hm, I'm not sure I understand how conda-export helps. It seems to go the opposite direction from what I want? (I want to give a high-level spec (w/ direct dependencies and minimal version constraints), and get out a low-level lock file w/ all dependencies (including transitive) at specific versions.)

I guess conda-export would be useful to get out the high-level spec from an existing environment that was created in an adhoc way?

ocefpaf commented 3 years ago

Oh. Sorry, it was missing some context. Not using conda-export per se but the part that can figure out what was pip installed vs conda installed could be used to performa "conda-lock" in a realized env.

mariusvniekerk commented 2 years ago

This is now supported by #122. Pr is #124

srstsavage commented 2 years ago

Original example environment in this issue includes a git+https:// pip dependency, which doesn't work in conda-lock 1.1.1.

$ cat environment.yml               
name: test                                                                                                                                                                      
channels:                                                                               
  - conda-forge                                                                                                                                                                 
dependencies:                                                                           
  - python=3.7                                                                                                                                                                  
  - pip:                                                                                
    - xarray                                                                                                                                                                    
    - git+https://github.com/pandas-dev/pandas.git@v1.4.4     
$ conda list | grep conda-lock                                                                                                                                                  
conda-lock                1.1.1              pyhd8ed1ab_0    conda-forge
$ conda-lock -p osx-64 -p linux-64 2>&1 | tail
    parsed_req = Requirement.parse(requirement_specifier)
  File "/home/shane/miniconda3/lib/python3.9/site-packages/pkg_resources/__init__.py", line 3139, in parse
    req, = parse_requirements(s)
  File "/home/shane/miniconda3/lib/python3.9/site-packages/pkg_resources/__init__.py", line 3084, in parse_requirements
    yield Requirement(line)
  File "/home/shane/miniconda3/lib/python3.9/site-packages/pkg_resources/__init__.py", line 3094, in __init__
    super(Requirement, self).__init__(requirement_string)
  File "/home/shane/miniconda3/lib/python3.9/site-packages/pkg_resources/_vendor/packaging/requirements.py", line 100, in __init__
    raise InvalidRequirement(
pkg_resources.extern.packaging.requirements.InvalidRequirement: Parse error at "'+https:/'": Expected stringEnd

197 is related but not currently equivalent since it specifically references installing from private GitHub repos. Should that issue be expanded, this one reopened, or a new one created?