conda / conda-lock

Lightweight lockfile for conda environments
https://conda.github.io/conda-lock/
Other
492 stars 103 forks source link

Hanging on my Local Environment when including PyPi dependencies #315

Closed srilman closed 1 year ago

srilman commented 1 year ago

I have noticed that recently, conda-lock seems to hang on my local machine when building a lockfile for a specification that includes pip dependencies. I do not see any hangs when building conda-only lockfiles. And when I tried conda-lock in the past a couple of months ago, I did not see this problem.

For example, I tested building a lockfile for the following environment file

channels:
  - conda-forge

platforms:
  - linux-64

dependencies:
  - pip
  - pip:
    - deltalake

I ran it using the following command

pipx run conda-lock -f ex.yml --log-level DEBUG    

I also tested the main branch version by running

pipx run --spec "git+https://github.com/conda-incubator/conda-lock.git@main" conda-lock -f ex.yml --log-level DEBUG

When I tested on a docker container, building the lockfile took under a minute. When testing on the base machine (which is a Mac running on Python 3.10 installed from Homebrew), it was still running even after 10 minutes. This was the last log output

DEBUG:urllib3.connectionpool:Starting new HTTPS connection (1): pypi.org:443
DEBUG:urllib3.connectionpool:https://pypi.org:443 "GET /pypi/deltalake/json HTTP/1.1" 304 0
DEBUG:conda_lock._vendor.poetry.repositories.pypi_repository:<debug>PyPI:</debug> 31 packages found for deltalake *
DEBUG:urllib3.connectionpool:Starting new HTTPS connection (1): pypi.org:443
DEBUG:urllib3.connectionpool:https://pypi.org:443 "GET /pypi/pyarrow/json HTTP/1.1" 304 0
DEBUG:conda_lock._vendor.poetry.repositories.pypi_repository:<debug>PyPI:</debug> No release information found for pyarrow-0.1.0, skipping
DEBUG:conda_lock._vendor.poetry.repositories.pypi_repository:<debug>PyPI:</debug> 5 packages found for pyarrow >=7
DEBUG:conda_lock._vendor.poetry.repositories.pypi_repository:<debug>PyPI:</debug> The cache for pyarrow 10.0.1 is outdated. Refreshing.
DEBUG:conda_lock._vendor.poetry.repositories.pypi_repository:<debug>PyPI:</debug> Getting info for pyarrow (10.0.1) from PyPI
DEBUG:urllib3.connectionpool:Starting new HTTPS connection (1): pypi.org:443
DEBUG:urllib3.connectionpool:https://pypi.org:443 "GET /pypi/pyarrow/10.0.1/json HTTP/1.1" 200 5612
srilman commented 1 year ago

Deleting poetry's cache directory (https://python-poetry.org/docs/configuration/#cache-dir) seems to have fixed the issue.

So between the first time I tested conda-lock and now, I did install and run poetry. My guess is that there is some difference between how poetry 1.3 (current version) handles its cache and how the vendored poetry handles it.

In order to avoid issues like this in the future, I think it would be best if the vendored poetry point its cache to a different directory. Any thoughts?

mariusvniekerk commented 1 year ago

Generally for complex dependencies that pull in lots of things like deltalake you may want to inspect their dependencies manually to see if they have any dependencies that are already available on conda.

In this case

$ grayskull pypi deltalake
...
Build requirements:
  <none>
Host requirements:
  - python
  - pip
Run requirements:
  - python
  - pyarrow >=7
  - typing-extensions   # [py<38]

so adding in pyarrow as a conda dep would fix your problem

srilman commented 1 year ago

Actually in my original use case, I included PyArrow as a conda dep. That did not change anything.

maresb commented 1 year ago

@srilman, I'm trying to get caught up here, and I noticed

In order to avoid issues like this in the future, I think it would be best if the vendored poetry point its cache to a different directory.

If you think this would help, it makes sense to me. I really want to get that other stuff merged, but perhaps we could also think about a PR for this? (Note: maintenance will be much easier if you can avoid touching any of the vendored poetry code.)

srilman commented 1 year ago

@maresb I think we should. I found a couple of other situations where this ended up being the root problem, and I see that someone else had a similar situation too.

Any thoughts on a potential fix? The crude approach is to just modify the POETRY_CACHE_DIR environment variable at the top of the main script. But it might be better if there was some way to pass it in directly.

maresb commented 1 year ago

Ya, it's definitely best to find a way to pass it in directly, if possible. But I suspect that we can't.

After a quick glance, it looks like the cache directory is set in _vendor/poetry/locations.py, and it doesn't look like there's a good way to configure it.

It seems that pypoetry is Poetry's universal prefix, so perhaps the best way would be to add a substitution in pyproject.toml for "pypoetry""pypoetry-conda-lock". Then we should re-vendor Poetry... but unfortunately this is a very involved process which I hope to get to within the next month or two. (There are many subtle details to check, and there are complications due to the vendoring of Poetry itself of several other packages.) As an interim solution, we could carry out the substitution by hand.

What do you think?

srilman commented 1 year ago

Sounds like a good approach in general. It'd be best to avoid other potential conflicts anyways