conda / conda-lock

Lightweight lockfile for conda environments
https://conda.github.io/conda-lock/
Other
467 stars 102 forks source link

Support Multiple Categories for Sub-Dependencies in Lockfile #390

Open srilman opened 1 year ago

srilman commented 1 year ago

Partially implements #278. This PR supports internally storing multiple categories for sub-dependencies (dependencies of dependencies specified in source files) and including that information in the lockfile, following the syntax specified in the associated issue.

Note that I specifically tackled sub-dependencies first because source dependencies can have the issue where they are specified in two different source files. In addition to having 2 different category specs, they may have different version specs as well. There is an existing PR (#300) to merge version constraints together, but it is blocked right now due to a discussion. I feel like this is not as common of a situation, so I decided to work on sub-dependencies first.

As discussed in the associated issue, this PR will change the resulting lockfile by adding the additional field categories. Furthermore, we will also have multiple copies of the same dependency for each category in the categories. This PR makes sure that conda-lock produces a lockfile in this format and parse lockfiles with this and the previous format correctly.

This PR builds on top of #389.

netlify[bot] commented 1 year ago

Deploy Preview for conda-lock ready!

Name Link
Latest commit 90a7f0801b3c748ecb8a7c6aad601331b5292771
Latest deploy log https://app.netlify.com/sites/conda-lock/deploys/65963a534838f60008fe3f5e
Deploy Preview https://deploy-preview-390--conda-lock.netlify.app
Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify site configuration.

srilman commented 1 year ago

@maresb Any chance you can review this sometime this week?

maresb commented 1 year ago

It's really tough for me since I'm in the middle of a big move...

I wanted to ask... have you checked that the resulting lockfiles install with Micromamba?

maresb commented 1 year ago

I feel really bad. Despite appearances I'm very eager to get this in.

srilman commented 1 year ago

@maresb No worries! I wanted to remind you just in case, but not a big deal at all. i have tested this PR on a very large production environment lockfile, but not with Micromamba (since Micromamba used to not have certain features I needed until 1.4). I will do so soon and let you know!

maresb commented 1 year ago

@srilman, I'd like to get as much as possible merged this weekend. I've lost sight a bit of all the PRs. Could you please suggest to me a merge order? Like #389, #390, and then maybe #300?

srilman commented 1 year ago

@maresb I think #300 needs to be rebased a little bit first. So if possible, could you review #389 and then #390 for now?

maresb commented 1 year ago

@srilman, great! I'm on it now...

maresb commented 1 year ago

In this PR #390 it looks like optional was removed via commits in the middle, as per #389. That means this PR will need to be rebased, right? For the moment I'm proceeding with the review, ignoring optional...

EDIT: I think I was confused... the optional I was looking at was for Poetry. :slightly_smiling_face:

maresb commented 1 year ago

Important note: in order to avoid breaking backwards compatibility, we should still emit the category key for now. (Removing category will unfortunately require a schema change.) This probably entails reincluding the previously-computed value for category, preferably fenced off with comments to aid future excision.

maresb commented 1 year ago

I'm worried now that even including categories will require a schema change, since LockedDependency is a StrictModel. Does this mean that conda-lock <=1.4.0 will balk if it finds categories in a lockfile? :thinking:

srilman commented 1 year ago

@maresb I'm going to rebase this PR and address some of the comments.

maresb commented 1 year ago

That would be wonderful, thanks so much @srilman!!!

srilman commented 9 months ago

@maresb I ran into this problem recently and couldn't get around it, so I can't put off this PR any longer. I tried naively rebasing but ran into some issues that I need your opinion with.

I noticed that a lot has changed in terms of the lockfile format. There seems to be 2 lockfile formats now, I presume because of how Pydantic works. In this PR, we add an additional field to the lockfile format, categories. Can we still add the new field to both V1 and V2, or can we only add it to V2? If we can only add it to V2, then we will need a follow-up PR to make V2 the default. If there are any other changes we want to make to the lockfile format, we should also do so soon.

What are your thoughts?

maresb commented 9 months ago

Hey @srilman, it's really great to hear from you again!!!

I already attempted and failed to rebase this in #410. I've been trying to steer the changes in a direction to make a rebase easier, so maybe it makes sense to give this another shot.

We can't change the lockfile format without big discussions from the Mamba and Rattler folks. (We would need to do a major-version release coordinated with Micromamba.) Hence the V2 format is a proposal, and we work internally with V2, but the input/output always goes through V1. And categories is V2-only. Does this make sense?

Ideally we would implement categories in V2 where everything works as it should, and then when we convert to V1 for output, it reproduces as closely as possible the current weird behavior. This way we don't have any breaking changes, and everything's ready to go with V2.

Maybe we could add some flags that are very clearly marked as experimental so that we can export a V2 format file that will only work with conda-lock.

You may be interested in #546 where we're working on merging version specifications. It'd be really great to get your feedback there if you get the chance.

srilman commented 8 months ago

@maresb Took a while, but finally rebased the PR. You're right, it seemed easier to integrate than before.

Like you said, I made categories a V2 field while expanding to multiple elements with a different category for V1. Everything seems to work fine right now, but we can add an experimental V2 flag in another PR.

I'll take a look at https://github.com/conda/conda-lock/pull/546, but it looks like that PR might need some changes from this PR.

maresb commented 8 months ago

This is very exciting, thanks so much @srilman!!!

while expanding to multiple elements with a different category for V1

Ok, but we'll need to thoroughly test this with both conda-lock install and micromamba to make sure it doesn't break anything. If everything works that would be amazing!!! How much have you done in terms of testing?

While #546 could greatly benefit from your input, I'd really like to get this in without more rebases, so let's make this PR top priority.

srilman commented 8 months ago

Ok, but we'll need to thoroughly test this with both conda-lock install and micromamba to make sure it doesn't break anything. If everything works that would be amazing!!! How much have you done in terms of testing?

Any thoughts on how to do this? Can we write unit tests for this? Probably for conda-lock, but micromamba? I wrote some unit tests for the generation portion, but not the install step. In terms of manual testing, I'm manually verifying that it works for my use case.

maresb commented 8 months ago

It would be really hard to write unit tests here for Micromamba's install process. I'm mainly interested in if/how Micromamba handles deduplication. If you have a Python dependency in both main and dev, then will Micromamba install Python twice?

Other than that, we might want to write a unit test with a simple example to make sure that we get a functioning environment after installing from a simple multi-category generated lockfile, and the test would just be running some installed thing as a subprocess.

srilman commented 8 months ago

If you have a Python dependency in both main and dev, then will Micromamba install Python twice?

In the PR, I added an extra filter pass that takes any dependency in main and another category and moves it to just main. But I get what you mean in the general sense. Any dependency in 2 optional categories that are selected at the same time.

Other than that, we might want to write a unit test with a simple example to make sure that we get a functioning environment after installing from a simple multi-category generated lockfile, and the test would just be running some installed thing as a subprocess.

Let me see what I can do, but feel free to add to this PR if you have the time to do so.

srilman commented 7 months ago

Ok so far for my use case, this PR seems to be working well. It seems to be producing a valid lockfile (based on a quick script I whipped up to make sure that the dependency tree is correct). conda-lock install worked, but I haven't tested micromamba just yet.

In addition, I added the test case for the conda-lock install step.

@maresb When you have a chance, could you review this PR and let me know if there's anything else you'd like me to do? I'd really love to see this merged soon so that I can start using it safely.

srilman commented 7 months ago

@maresb So I've started integrating this into my own use case in general, and had a chance to large-scale test and with micromamba. This is what I saw so far:

IMO the conda-lock 2.5.1 is not a blocker since merging this PR will fix it. It's just annoying for me since I'm trying to use this PR now.

srilman commented 7 months ago

FYI I think the test case I added in this PR is a good example to test Micromamba with

maresb commented 7 months ago

I've carefully looked through the first commit. It was really challenging for me to understand what's going on. Probably I will have a lot of comments to add. Here are some notes:

I'm trying to understand "Change category ā†’ categories in LockfileV2 only".

There are a few classes involved, namely:

_BaseDependency: has .category that defaults to "main"

BaseLockedDependency ā†’ LockedDependencyV1, LockedDependencyV2

We want to change LockedDependencyV2 to have categories instead of category. It's currently on BasedLockDepenedency (with default value "main"), so we need to move it from BaseLockedDependency to LockedDependency (V1). This lets us add it to LockedDependency (V2) as a Set[str].

We add categories to LockedDependencyV2 with default value set(). This forces a few adjustments:

  1. We need to return multiple LockedDependencyV1 objects when converting v2ā†’v1.
  2. We need to aggregate the multiple LockedDependencyV2 objects into a single v1 object.
  3. We need to fill in the value of categories

For 1. we convert a v2 dependency to a list of v1 dependencies that are identical except for their category field, and the category field spans the values in categories.

To convert from v1 back to v2, we start with a list of LockedDependencyV1. These will come from reading the lockfile which will contain several duplicates with the same key that differ only in category. We assemble these duplicates into a list, copy the fields from the first element of the list, and let categories be the union of the category fields from the individual elements.

For 3. we start by leaving categories empty so that they can be filled in according to transitivity.

srilman commented 7 months ago

In that case, I'm going to add some comments to make the second commit a bit easier to understand. Thanks for pointing that out.

maresb commented 5 months ago

Hey, I just wanted to say that I have not forgotten about this!!!

I've been trying to very carefully understand the fine details of this PR, and have been working on trying to make pure refactoring changes in https://github.com/conda/conda-lock/pull/589 that will allow us to simply switch over to the new system.

I feel really bad because I keep having annoying personal stuff (like bureaucracy of registering in a new country, taxes, etc.) that keeps draining away the time that I want to devote to pushing this through.

At the same time, I want to be realistic, and I haven't had the available time I envisioned to push things forward here. The folks at prefix.dev have a team working full-time doing amazing work with Conda lockfiles. So I think pixi is a more sustainable solution in the medium-term. I'd like to converge with them so that conda-lock users can smoothly transition when they're ready.

I just pinged the prefix folks here: https://discord.com/channels/1082332781146800168/1214169703811907604/1214169703811907604 It seems like they're not concerned about supporting conda-lock lockfiles, so I think as long as we support conda-lock and micromamba we'll be good to go. :rocket: