conda / conda-lock

Lightweight lockfile for conda environments
https://conda.github.io/conda-lock/
Other
467 stars 102 forks source link

Default PyPI dependencies for pyproject.toml #334 #362

Closed pep-sanwer closed 1 year ago

pep-sanwer commented 1 year ago

Quick draft implementation of changes discussed in #334:

Implements a default-poetry-source-pypi flag in [tool.conda-lock] to default to source = "pypi" for all dependencies in [tool.poetry.dependencies], i.e.:

by explicitly providing default-poetry-source-pypi = true in [tool.conda-lock] section, e.g.:

[tool.conda-lock]
default-poetry-source-pypi = true

Updated Readme and tests to reflect changes. Please let me know any thoughts or comments!

netlify[bot] commented 1 year ago

Deploy Preview for conda-lock ready!

Name Link
Latest commit 629eced3f7a980fd97baa8d70ddd9f09be326c42
Latest deploy log https://app.netlify.com/sites/conda-lock/deploys/646e33bee897dd0007682509
Deploy Preview https://deploy-preview-362--conda-lock.netlify.app
Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify site settings.

maresb commented 1 year ago

Thanks @pep-sanwer, this looks good.

Could you please remove the references to Poetry? Simply because specifying dependencies in pyproject.toml is not Poetry-specific. (All standards-compliant build tools specify their dependencies in pyproject.toml, not just Poetry.) The standard location for specifying dependencies in pyproject.toml is [project.dependencies], used by all major build backends except for Poetry (when dependencies aren't [dynamically]() specified).

Unfortunately my alternative suggestions for wording and variable names at the moment are a bit awkward... I'd say that dependencies specified in pyproject.toml are "project dependencies" or pyproject.toml dependencies. Maybe call the variable default-pyproject-source?

Finally, I think we may be limiting ourselves by having the setting be a boolean. Namely, we may want to specify a channel, or a list of channels. How about something like:

dependency-source-priority = ["pypi", "conda-forge"]

@croth1, could you please have a look at this? (He has been thinking about channels lately.)

croth1 commented 1 year ago

With limited knowledge of the whole system here are my two cents:

I like the idea of ultimately having something like: dependency-source-priority = ["pypi", "conda-forge"] - the way we now approach python dependencies as a two-stage solving process, would probably make it incredibly hard to allow mixing pypi and conda channels in the dependency declaration (e.g. ["private-pypi", "pytorch", "conda-forge", "pypi"]. So naturally I feel that we might converge towards separate: pypi-source-priority = ["private-pypi", "pypi"] and conda-channels. However this would still require the boolean choice which one comes first.

(side note: I find it extremely attractive to explore whether instead of solving with poetry ontop of conda, and then trying to mingle everything together, we could make a fake-repo just like for the virtual packages and solve everything together with the conda solver. While this seems technically much harder, I have a strong suspicion that this will be conceptually much easier and a lot of headaches we have right now might just magically disappear)

Back to the topic 😬 - as I see it the changes in this PR are completely poetry-toml specific and I feel it's worth offering this, without having it to fit into a grand vision. For now, I would suggest keeping the changes as they are and configure in a poetry specific namespace:

[tool.conda-lock.poetry_toml]
default-dependencies-to-conda = false

Should be easy to deprecate and pull these configurations into the main [tool.conda-lock] section once we have source-agnostic general solution. What do think?

maresb commented 1 year ago

Thanks a lot @croth1 for the feedback!

I agree that the "right" way to do things would be with a unified solver. I'm very interested in looking into what this would involve, and I suspect we could use something like libsolv. It might be interesting to read up on the details and ping wolfv for advice. But unless some expert like wolfv steps in, I don't see this happening anytime soon.

As for a short/medium-term solution, based on @croth1's advice, I think it makes sense that we specify which package manager (pip vs conda) to prefer by default.

As for terminology, we already have this in the new-style unified lockfile. There we use a categorical value package[].manager with values conda or pip. While this is a binary choice, I don't see it as a "boolean" true/false choice, so I think strings would be better here. For example:

[tool.conda-lock]
default-project-dependency-manager = "pip"

Now that we have identified "dependency managers" rather than "channels" as the appropriate level of abstraction, I wonder again if it might make sense to use a list here, so something like

dependency-manager-priority = ["conda", "pip"]  # the current default
dependency-manager-priority = ["pip"]  # pip-only
dependency-manager-priority = ["pip", "conda"]  # pip with conda as fallback
dependency-manager-priority = ["conda"]  # strict conda mode

Do you think this would be useful? Or more importantly doable?

Terminology rant...

In short, in order to avoid perpetuation of confusion, I simply want to use the proper terminology. Specifically, "dependencies specified in pyproject.toml" are not "Poetry dependencies" in the same sense that "pain-relief drugs" are not "Tylenol".

I've seen widespread confusion (and been confused myself) about the Python packaging ecosystem, and a big part of this appears to be generization which falsely conflates various tools and standards. (For example, I used to conflate pip with setuptools.)

I get concerned when I see "poetry-toml", because this serves to perpetuate confusion that pyproject.toml is a Poetry file. All modern Python projects should strive to declare their dependencies in pyproject.toml, whether or not they use Poetry. Poetry is not required in order to use a pyproject.toml. Moreover, conda-lock already supports non-Poetry pyproject.toml usage.

For a concrete example, conda-lock doesn't use Poetry, and conda-lock can generate lockfiles for itself (up to a few remaining general issues).

maresb commented 1 year ago

What's the expected behavior for non-Poetry pyproject.toml files, i.e. when dependencies are defined in [project.dependencies]?

(e.g. conda-lock)

pep-sanwer commented 1 year ago

@croth1, @maresb thank you both for the wonderful feedback and comments.

Re: Terminology & Poetry-namings & Abstracting away from specific tools I absolutely agree with & take your points @maresb - especially on terminology - my only reasoning for the namings in this PR was to reflect that I've only touched how Poetry-listed-dependencies are handled by Conda-lock. This is primarily a consequence of my desire to stay humble and small with my repo touches (and the fact that I have no familiarity with Poetry-alternatives). I also agree with the long-term route of dependency-manager-priority lists as being a powerful knob to expose to users.

I especially like @croth1's short-term resolution suggestion of (with slight renaming)

[tool.conda-lock.poetry]
default-dependencies-to-conda = false

Re: expected behavior for non-Poetry pyproject.toml files, i.e. when dependencies are defined in [project.dependencies] - I would imagine non-Poetry users would find useful having this dependency management control option open to them.

pep-sanwer commented 1 year ago

@maresb

The latest changes on the PR implement the discussed behavior across all currently supported pyproject.toml flavors, (poetry, pep621(flit, pdm)).

Naming / config path has changed to reflect - setting:

[tool.conda-lock]
default-non-conda-source = "pip"

will treat all non-conda dependencies -- all dependencies defined outside of [tool.conda-lock.dependencies] -- as pip dependencies, i.e.:

Appreciate all your feedback!

maresb commented 1 year ago

This looks great!!! Thanks so much for the update.

I won't be able to look at it this week unfortunately, due to time constraints.

pep-sanwer commented 1 year ago

Understood. Appreciate the heads-up!

pep-sanwer commented 1 year ago

@maresb, please let me know if I can help ease this merge in anyway

Appreciative of your time!

maresb commented 1 year ago

Thanks so much for your perseverance @pep-sanwer!

Let's do one more rebase. I'll squeeze this in before the Poetry revendor.

maresb commented 1 year ago

The branch is out-of-date again, but don't bother rebasing this time, as it should be totally independent.

Now we just have to deal with this recent flakiness with the failing tests... :disappointed:

pep-sanwer commented 1 year ago

Got it - and yes most recent commits look completely independent. I'll the PR as is then!

pep-sanwer commented 1 year ago

@maresb, Thank you for merging this! Unfortunately, and my apologies for headache, my work was a bit sloppy. As far as I can tell,

[tool.conda-lock]
default-non-conda-source = "pip"

breaks poetry pyproject.toml (and I am sure any other flavor that lists python / requires-python version as a dependency) lock-file resolution.

I haven't delved too deeply into the cause - but due to how the default-non-conda-source = "pip" logic is implemented - i.e:

all dependencies defined outside of [tool.conda-lock.dependencies] -- are pip dependencies, i.e.:

  • Default to pip dependencies for [tool.poetry.dependencies], [project.dependencies], etc.
  • Default to conda dependencies for [tool.conda-lock.dependencies]

python version is being treated as a pip/pypi managed dependency, which obviously breaks the lock resolution.

A fix on my local (only for poetry, would love a pointer on how to handle pdm/filt): image

conda-lock on  default-poetry-source-pypi [!] via 🐍 v3.8.16 via 🅒  conda_lock_test
λ git diff

Δ conda_lock/src_parser/pyproject_toml.py
────────────────────────────────────────────────────────────────────────────────────────────────
────────────────────────────────────────┐
• 166: def parse_poetry_pyproject_toml( │
────────────────────────────────────────┘
            extras = []
            in_extra: bool = False

            if (depname == "python"):
                manager = "conda"

            # Extras can only be defined in `tool.poetry.dependencies`
            if default_category == "main":
                in_extra = category != "main"
maresb commented 1 year ago

Hey, thanks a lot for the report @pep-sanwer! No worries, this stuff happens. It looks like we just missed a test case. On the bright side, this is a new feature, so people aren't relying on it yet.

One quirk of Poetry is that it doesn't define the Python dependency in project.requires-python, so this might be a Poetry-only issue.

But with a brief glance through the code, I don't see anywhere that we're parsing requires-python. :see_no_evil: Maybe we're ignoring it? That's yet another issue to open.

Would you like to open a PR with your fix for Poetry plus updated tests?

pep-sanwer commented 1 year ago

@maresb - You are right, from what I can tell, python version is not ingested across the board for pyproject.toml specs! Let me sketch out a possible catch-all solve (poetry, filt/pdm) and come back with a PR. My work slack has tightened a bit unfortunately, so please forgive a protracted delay.

maresb commented 1 year ago

Thanks a lot @pep-sanwer for looking into this!

A comprehensive solution would be excellent, but in case it helps to make incremental progress, I think we could break this one up into the Poetry and non-Poetry parts as separate pieces. (I conceptualize this as two separate bugs.)