conda / conda-lock

Lightweight lockfile for conda environments
https://conda.github.io/conda-lock/
Other
495 stars 103 forks source link

Solver can fail with mixed pip/conda dependencies #179

Open pconrad-insitro opened 2 years ago

pconrad-insitro commented 2 years ago

Thank you for releasing v1.0, and congratulations on the progress!

I believe I have found a corner case in the joint conda/poetry solver, having to do with package renaming. This is a very useful capability, and I'm not surprised it is subtle.

Consider this example yaml:

name: test
channels:
  - conda-forge
  - defaults
dependencies:
  - matplotlib>=3.1.2,<4 # Solve fails if this is requested from conda
  - python=3.9.*
  - pip:
    - seaborn # Depends on matplotlib
platforms:
  - linux-64

The apparent problem is that conda knows about both matplotlib and matplotlib-base, but pip only knows about matplotlib. Somewhere in the conversions between the two systems, it's getting confused and checking for the conda name in the pip list.

Comment out the matplotlib line in the spec above and it works, as the solution will be entirely pip. As is, it will fail on v1.0.3 (on an intel mac):

conda-lock --mamba -f minimal_env.yml -p linux-64
Locking dependencies for ['linux-64']...
INFO:conda_lock.conda_solver:linux-64 using specs ['matplotlib >=3.1.2,<4', 'python 3.9.*', 'pip *']
Traceback (most recent call last):
  File "conda-lock/conda_lock/src_parser/__init__.py", line 271, in seperator_munge_get
    return d[key]
KeyError: 'matplotlib-base'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "conda_lock/src_parser/__init__.py", line 274, in seperator_munge_get
    return d[key.replace("-", "_")]
KeyError: 'matplotlib_base'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "bin/conda-lock", line 33, in <module>
    sys.exit(load_entry_point('conda-lock', 'console_scripts', 'conda-lock')())
  File "python3.9/site-packages/click/core.py", line 1128, in __call__
    return self.main(*args, **kwargs)
  File "python3.9/site-packages/click/core.py", line 1053, in main
    rv = self.invoke(ctx)
  File "python3.9/site-packages/click/core.py", line 1659, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "python3.9/site-packages/click/core.py", line 1395, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "python3.9/site-packages/click/core.py", line 754, in invoke
    return __callback(*args, **kwargs)
  File "python3.9/site-packages/click/decorators.py", line 26, in new_func
    return f(get_current_context(), *args, **kwargs)
  File "conda_lock/conda_lock.py", line 1166, in lock
    lock_func(
  File "conda_lock/conda_lock.py", line 936, in run_lock
    make_lock_files(
  File "conda_lock/conda_lock.py", line 388, in make_lock_files
    lock_content = lock_content | create_lockfile_from_spec(
  File "conda_lock/conda_lock.py", line 722, in create_lockfile_from_spec
    deps = _solve_for_arch(
  File "conda_lock/conda_lock.py", line 688, in _solve_for_arch
    pip_deps = solve_pypi(
  File "conda_lock/pypi_solver.py", line 300, in solve_pypi
    src_parser._apply_categories(requested=pip_specs, planned=planned)
  File "conda_lock/conda_lock/src_parser/__init__.py", line 285, in _apply_categories
    for dep in seperator_munge_get(planned, item).dependencies
  File "conda-lock/conda_lock/src_parser/__init__.py", line 276, in seperator_munge_get
    return d[key.replace("_", "-")]
KeyError: 'matplotlib-base'

I lightly redacted the paths, but the stack trace is hopefully clear.

Any thoughts? Let me know if I can assist in debugging. I traced the code for a while. I suspect the solution is another careful application of the forward/reverse naming mapping, but I am not sure what change is best.

wosiu commented 2 years ago

Same for me. Short simple yaml when it fails example:

name: ocr_ws_prod
channels:
  - defaults
dependencies:
  - python>=3.7,<3.8
  - pip:
    - Django==1.11.29

Error:

(base) root@container:~/repo# conda-lock -f env_conda_prod.yml -p linux-64
Locking dependencies for ['linux-64']...
INFO:conda_lock.conda_solver:linux-64 using specs ['python >=3.7,<3.8', 'pip *']
Traceback (most recent call last):
  File "/opt/conda/lib/python3.7/site-packages/conda_lock/src_parser/__init__.py", line 282, in seperator_munge_get
    return d[key]
KeyError: 'Django'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/opt/conda/lib/python3.7/site-packages/conda_lock/src_parser/__init__.py", line 285, in seperator_munge_get
    return d[key.replace("-", "_")]
KeyError: 'Django'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/opt/conda/bin/conda-lock", line 8, in <module>
    sys.exit(main())
  File "/opt/conda/lib/python3.7/site-packages/click/core.py", line 1130, in __call__
    return self.main(*args, **kwargs)
  File "/opt/conda/lib/python3.7/site-packages/click/core.py", line 1055, in main
    rv = self.invoke(ctx)
  File "/opt/conda/lib/python3.7/site-packages/click/core.py", line 1657, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/opt/conda/lib/python3.7/site-packages/click/core.py", line 1404, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/opt/conda/lib/python3.7/site-packages/click/core.py", line 760, in invoke
    return __callback(*args, **kwargs)
  File "/opt/conda/lib/python3.7/site-packages/click/decorators.py", line 26, in new_func
    return f(get_current_context(), *args, **kwargs)
  File "/opt/conda/lib/python3.7/site-packages/conda_lock/conda_lock.py", line 1167, in lock
    filename_template=filename_template, check_input_hash=check_input_hash
  File "/opt/conda/lib/python3.7/site-packages/conda_lock/conda_lock.py", line 949, in run_lock
    filter_categories=filter_categories,
  File "/opt/conda/lib/python3.7/site-packages/conda_lock/conda_lock.py", line 393, in make_lock_files
    update_spec=update_spec,
  File "/opt/conda/lib/python3.7/site-packages/conda_lock/conda_lock.py", line 727, in create_lockfile_from_spec
    update_spec=update_spec,
  File "/opt/conda/lib/python3.7/site-packages/conda_lock/conda_lock.py", line 696, in _solve_for_arch
    platform=platform,
  File "/opt/conda/lib/python3.7/site-packages/conda_lock/pypi_solver.py", line 300, in solve_pypi
    src_parser._apply_categories(requested=pip_specs, planned=planned)
  File "/opt/conda/lib/python3.7/site-packages/conda_lock/src_parser/__init__.py", line 296, in _apply_categories
    for dep in seperator_munge_get(planned, item).dependencies
  File "/opt/conda/lib/python3.7/site-packages/conda_lock/src_parser/__init__.py", line 287, in seperator_munge_get
    return d[key.replace("_", "-")]
KeyError: 'Django'

conda version: 4.12.0 conda-lock version: 1.0.4 run from inside ubuntu based docker container

jesshart commented 2 years ago

@pconrad-insitro & @wosiu I found using conda-forge exclusively helps me to avoid these issues. I spent a good while figuring it out and wrote up this walkthrough: https://github.com/jesshart/code-tutorials/blob/main/python/dependency-management/README.md

I actually stopped using pip and found conda-lock does what I need it to. I hope this is helpful.

Note: This is not possible for all projects.

pconrad-insitro commented 2 years ago

@jesshart - unfortunately, I can't switch to a pure conda-forge solution. However, I now realize I didn't really explain that.

We mostly use conda dependencies, but have a few that are not available (for various reasons), and hence fall back to pip dependencies. I was excited at the potential to jointly solve all the dependencies, made possible by the new features. Sadly, I hit the bug I showed.

The reproduction example I gave is contrived, since seaborn is indeed available on conda-forge. Our actual problem is is very similar, though. It is likewise caused by a pip dependency with a transitive dependency on matplotlib.

@wosiu - thanks for the second example, that is even simpler!

wosiu commented 2 years ago

I can't go with conda-forge only as suggested by @jesshart, because some of the package's versions are not available in conda-forge, whereas they are available via pip.

jesshart commented 2 years ago

@pconrad-insitro @wosiu while you wait on pip support in conda-lock, I found this might be useful (while definitely extra work):

Since you know what your direct dependencies are for pip and conda, you can make use of conda-lock for your conda packages and pip-compile for your pip packages.

  1. Remove your pip-related direct dependencies like django from your environment.yml (and the whole section of pip) and put your direct pip dependencies in a requirements.in file like so:
    # requirements.in
    django
  2. Run conda-lock as you like
  3. Run pip-compile --generate-hashes requirements.in > requirements.txt

Now you can start your conda environment from the conda-lock file and then run pip install -r requirements.txt to get the environment you desire with all packages and pinned dependencies.

lesteve commented 2 years ago

I am seeing a weird behaviour on my machine, with a very similar environment as the original post, sometimes it works sometimes I get the same KeyError: 'matplotlib-base' error. I tried both conda-lock 1.0.5 and main, they show the same behaviours.

Full stack-trace ``` Traceback (most recent call last): File "/home/local/lesteve/miniconda3/envs/conda-lock/lib/python3.10/site-packages/conda_lock/src_parser/__init__.py", line 282, in seperator_munge_get return d[key] KeyError: 'matplotlib-base' During handling of the above exception, another exception occurred: Traceback (most recent call last): File "/home/local/lesteve/miniconda3/envs/conda-lock/lib/python3.10/site-packages/conda_lock/src_parser/__init__.py", line 285, in seperator_munge_get return d[key.replace("-", "_")] KeyError: 'matplotlib_base' During handling of the above exception, another exception occurred: Traceback (most recent call last): File "/home/local/lesteve/miniconda3/envs/conda-lock/bin/conda-lock", line 10, in sys.exit(main()) File "/home/local/lesteve/miniconda3/envs/conda-lock/lib/python3.10/site-packages/click/core.py", line 1128, in __call__`pip` does not have anything that is called `matplotlib-base` return self.main(*args, **kwargs) File "/home/local/lesteve/miniconda3/envs/conda-lock/lib/python3.10/site-packages/click/core.py", line 1053, in main rv = self.invoke(ctx) File "/home/local/lesteve/miniconda3/envs/conda-lock/lib/python3.10/site-packages/click/core.py", line 1659, in invoke return _process_result(sub_ctx.command.invoke(sub_ctx)) File "/home/local/lesteve/miniconda3/envs/conda-lock/lib/python3.10/site-packages/click/core.py", line 1395, in invoke return ctx.invoke(self.callback, **ctx.params) File "/home/local/lesteve/miniconda3/envs/conda-lock/lib/python3.10/site-packages/click/core.py", line 754, in invoke return __callback(*args, **kwargs) File "/home/local/lesteve/miniconda3/envs/conda-lock/lib/python3.10/site-packages/click/decorators.py", line 26, in new_func return f(get_current_context(), *args, **kwargs) File "/home/local/lesteve/miniconda3/envs/conda-lock/lib/python3.10/site-packages/conda_lock/conda_lock.py", line 1170, in lock lock_func( File "/home/local/lesteve/miniconda3/envs/conda-lock/lib/python3.10/site-packages/conda_lock/conda_lock.py", line 940, in run_lock make_lock_files( File "/home/local/lesteve/miniconda3/envs/conda-lock/lib/python3.10/site-packages/conda_lock/conda_lock.py", line 388, in make_lock_files lock_content = lock_content | create_lockfile_from_spec( File "/home/local/lesteve/miniconda3/envs/conda-lock/lib/python3.10/site-packages/conda_lock/conda_lock.py", line 726, in create_lockfile_from_spec deps = _solve_for_arch( File "/home/local/lesteve/miniconda3/envs/conda-lock/lib/python3.10/site-packages/conda_lock/conda_lock.py", line 692, in _solve_for_arch pip_deps = solve_pypi( File "/home/local/lesteve/miniconda3/envs/conda-lock/lib/python3.10/site-packages/conda_lock/pypi_solver.py", line 300, in solve_pypi src_parser._apply_categories(requested=pip_specs, planned=planned) File "/home/local/lesteve/miniconda3/envs/conda-lock/lib/python3.10/site-packages/conda_lock/src_parser/__init__.py", line 296, in _apply_categories for dep in seperator_munge_get(planned, item).dependencies File "/home/local/lesteve/miniconda3/envs/conda-lock/lib/python3.10/site-packages/conda_lock/src_parser/__init__.py", line 287, in seperator_munge_get return d[key.replace("_", "-")] KeyError: 'matplotlib-base' ```

I know there was some work in https://github.com/conda-incubator/conda-lock/pull/157 to better support conda/pip interplay. There are work-arounds for this kind of issue in general, but I would be interested to get @mariusvniekerk insights about whether there is a chance tricky issues at the conda/pip boundaries may be handled one day.

More details about the issue (esp. random aspect)

I dug more into the issue, there are more details below if that can help.

To reproduce (very similar environment to the original post, with seaborn installled through conda and seaborn through pip)

cat << EOF > /tmp/test-environment.yml
channels:
  - conda-forge
dependencies:
  - matplotlib
  - pip:
    - seaborn # Depends on matplotlib
EOF
conda-lock lock -p linux-64 -f /tmp/test-environment.yml --lockfile /tmp/conda-lock.yml

A standalone Python script (mostly taken from conda_lock.conda_lock._solve_for_arch) that reproduces the issue:

from pathlib import Path

from conda_lock.conda_solver import solve_conda
from conda_lock.pypi_solver import solve_pypi
from conda_lock.src_parser import (
    VersionedDependency,
    Selectors
)
from conda_lock.models.channel import Channel

# You probably need to adapt the `conda` path and `platform`
conda = Path("~/miniconda3/condabin/mamba").expanduser()
platform = 'linux-64'
# there is an additional channel 
# Channel(url='file:///tmp/tmpbst6eigh'), hopefully this does not change the
# logic too much
channels = [Channel.from_string('conda-forge')]

requested_deps_by_name = {
    "conda": {
        "matplotlib": VersionedDependency(
            name="matplotlib",
            manager="conda",
            optional=False,
            category="main",
            extras=[],
            selectors=Selectors(platform=None),
            version="",
            build=None,
        ),
        "pip": VersionedDependency(
            name="pip",
            manager="conda",
            optional=False,
            category="main",
            extras=[],
            selectors=Selectors(platform=None),element
            version="*",
            build=None,
        ),
    },
    "pip": {
        "seaborn": VersionedDependency(
            name="seaborn",
            manager="pip",
            optional=False,
            category="main",
            extras=[],
            selectors=Selectors(platform=None),
            version="*",
            build=None,
        )
    },
}
locked_deps_by_name = {'conda': {}, 'pip': {}}

conda_deps = solve_conda(
    conda,
    specs=requested_deps_by_name["conda"],
    locked=locked_deps_by_name["conda"],
    update=[],
    platform=platform,
    channels=channels
)

conda_deps_keys = list(conda_deps.keys())
matplotlib_base_index = conda_deps_keys.index('matplotlib-base')
matplotlib_index = conda_deps_keys.index('matplotlib')

print(f"{matplotlib_base_index=}")
print(f"{matplotlib_index=}")

if matplotlib_base_index < matplotlib_index:
    print('Oh oh matplotlib-base first problems ahead')
else:
    print('matplotlib-base last lucky you')

pip_deps = solve_pypi(
    requested_deps_by_name["pip"],
    use_latest=[],
    pip_locked={},
    conda_locked={dep.name: dep for dep in conda_deps.values()},
    python_version=conda_deps["python"].version,
    platform=platform,
)

Key findings

RobertRosca commented 2 years ago

Thanks for the writeup @lesteve, I ran into this package as well as this issue today (while on my yearly attempts to find a way to reliably maintain large mixed conda/pip environments :smiling_face_with_tear:) as well. Your explanation saved me some time!

I've got a few ideas for how to handle this and have some (semi) working prototypes, @mariusvniekerk do you know if anybody's working on this already or could I take it up?

sergsb commented 2 years ago

Is there any progress with this problem?

bstadlbauer commented 2 years ago

Also ran into this today - would be great if that would be fixed. Happy to help out aswell :-)

glemaitre commented 1 year ago

Also got the same issue here with a single dependence (both depending python and pandas depending on tzdata).

name: test
channels:
  - conda-forge
dependencies:
  - python=3.9
  - pip:
    - pandas
maresb commented 1 year ago

This may also be closed by #290, thanks for the pointer @lesteve!

lesteve commented 1 year ago

I believe indeed that it has been fixed by #290 and this issue can be closed. I tried conda-lock from main with the environment mentioned in the comments and they all work fine.