conda / conda-lock

Lightweight lockfile for conda environments
https://conda.github.io/conda-lock/
Other
467 stars 102 forks source link

Add build string when resolving with --lock-file from environment #363

Open tdejager opened 1 year ago

tdejager commented 1 year ago

Checklist

What is the idea?

It seems that build that I assume corresponds to the build_string is not filled in when resolving an environment.yml file with mamba/micromamba/conda.

...snip..
package:
- category: main
  dependencies: {}
  hash:
    md5: d7c89558ba9fa0495403155b64376d81
    sha256: fe51de6107f9edc7aa4f786a70f4a883943bc9d39b3bb7307c04c41410990726
  manager: conda
  name: _libgcc_mutex
  optional: false
  platform: linux-64
  url: https://conda.anaconda.org/conda-forge/linux-64/_libgcc_mutex-0.1-conda_forge.tar.bz2
  version: '0.1'
- category: main
  dependencies: {}
  hash:
    md5: ff9f73d45c4a07d6f424495288a26080
    sha256: 8f6c81b0637771ae0ea73dc03a6d30bec3326ba3927f2a7b91931aa2d59b1789
  manager: conda
  name: ca-certificates
  optional: false
  platform: linux-64
  url: https://conda.anaconda.org/conda-forge/linux-64/ca-certificates-2022.12.7-ha878542_0.conda
  version: 2022.12.7
- category: main
  dependencies: {}
  hash:
    md5: 7aca3059a1729aa76c597603f10b0dd3
    sha256: f6cc89d887555912d6c61b295d398cff9ec982a3417d38025c45d5dd9b9e79cd
  manager: conda
  name: ld_impl_linux-64
  optional: false
  platform: linux-64
  url: https://conda.anaconda.org/conda-forge/linux-64/ld_impl_linux-64-2.40-h41732ed_0.conda
  version: '2.40'

None of the locked packages have the build attribute. Moreover, conda_solver.py:183 where the structure is being created does not fill it at all. I think the mamba json does return this.

Why is this needed?

Because it is an identifying feature for a package.

What should happen?

The code should be modified to fill in the build string where possible. We can now retro-actively extract it from the URL if we want, but I guess this would be more correct.

Additional Context

We are using the lock-files at prefix, if you want we could take a stab at a PR :)

maresb commented 1 year ago

Sounds good to me, perhaps under build_name? Note that this involves a (non-breaking, I think) change to the unified lockfile spec.

@mariusvniekerk, now that we're post-graduation, is there a procedure for maintaining and updating the spec?

The initial brainstorming of the spec occurred on https://github.com/mamba-org/mamba/issues/1209.

tdejager commented 1 year ago

Ah I supposed that: https://github.com/conda/conda-lock/blob/5bdab2b36c10de8e30054c0345ab3ebc195ecdfb/conda_lock/lockfile/models.py#L50

build attribute was meant for this, maybe it's something else?

maresb commented 1 year ago

Ah, indeed! Given that, from my perspective, I think this should be straightforward to get merged.

tdejager commented 1 year ago

There is another thing I should have mentioned before, but there is also a build number for conda packages but that is not included in the format at all.

We could just put it in the build attribute (it's at the end of the build string) but it's better to be explicit I suppose.

WDYT? @maresb

maresb commented 1 year ago

We get build number from micromamba; see https://github.com/conda/conda-lock/issues/338#issuecomment-1428699111. The format from Conda is a bit different though. I don't remember off the top of my head.

maresb commented 1 year ago

Conda/Mamba:

      {
        "base_url": "https://conda.anaconda.org/conda-forge",
        "build_number": 0,
        "build_string": "pyhd8ed1ab_0",
        "channel": "conda-forge",
        "dist_name": "pip-23.0.1-pyhd8ed1ab_0",
        "name": "pip",
        "platform": "noarch",
        "version": "23.0.1"
      }

Micromamba:

            {
                "build": "pyhd8ed1ab_0",
                "build_number": 0,
                "build_string": "pyhd8ed1ab_0",
                "channel": "https://conda.anaconda.org/conda-forge/noarch",
                "constrains": null,
                "depends": [
                    "setuptools",
                    "wheel",
                    "python >=3.7"
                ],
                "fn": "pip-23.0.1-pyhd8ed1ab_0.conda",
                "license": "MIT",
                "md5": "8025ca83b8ba5430b640b83917c2a6f7",
                "name": "pip",
                "sha256": "e1698cbf4964cd60a2885c0edbc654133cd0db5ac4cb568412250e577dbc42ad",
                "size": 1366466,
                "subdir": "noarch",
                "timestamp": 1676670714,
                "track_features": "",
                "url": "https://conda.anaconda.org/conda-forge/noarch/pip-23.0.1-pyhd8ed1ab_0.conda",
                "version": "23.0.1"
            }

So build_number should be straightfoward.

Also on my wishlist for a long time has been the package upload date, or timestamp. It exists in the repodata, but is for some reason not included by Conda in the dry-run json.

baszalmstra commented 1 year ago

For me, it would be ideal if the conda-lock file contains all the identifying properties that enable matching a MatchSpec. This would mean that all properties found in the repodata should also be present in the conda-lock file, since MatchSpec can technically match against any of those properties.

I would like to have this feature so we can check if MatchSpecs in an environment.yml file already match a conda-lock file. If they do, we can skip the solving process altogether.

After reviewing model.py and repodata.json, I found some diverging or missing fields:

Although not all properties may be exposed by Conda/Mamba, we should consider adding these fields to the model. WDYT?

maresb commented 1 year ago

I'm in favor of adding all the available data.

One potential challenge is that we work with PyPI dependencies, and I'm not sure if or how much we use the MatchSpec for this purpose.

Another challenge is the divergence between conda and libmamba. Indeed, this weekend I've been trying to improve the reliability of the conda-lock CI tests against apparent race conditions. One annoyance about Micromamba is the lack of pkgs_dirs in micromamba info --json. I'd like to be able to locate repodata.json so that I get access to missing data, and it would make things much easier if I could do this with Micromamba. Do either of you have ideas for how to compute pkgs_dirs without Conda?

Prefix looks like a very exciting venture!!! I'm really looking forward to what comes out of it. Also, I really wish I knew Rust! :smile:

tdejager commented 1 year ago

Thanks! @maresb :) We are as well! 😄

Do you think we would need to introduce a new version if we add any extra fields? I suppose the version is still at 1 but I'm unsure when you want to bump it.

I've asked about the pkg_dirs on our zulip :)

maresb commented 1 year ago

Thanks!!!

To clarify, are you asking about the lockfile version? If so, then my understanding (meaning what I say should be verified with people like Marius and Wolf) is that this integer for version is semantically major so that backwards compatible changes don't require an increment. My understanding is also that adding a new optional field is backwards compatible. So I believe that as long as nothing is renamed we can stay at 1.

AFAIK there is no formal spec, so it feels a bit silly to be discussing as if there were one. But we should probably formalize it so that versions and changes are concrete.

tdejager commented 1 year ago

Ah okay! Yeah that is what I meant :)

wolfv commented 1 year ago

@maresb it's maybe not as nice but you can get this as YAML:

╰─$ micromamba config list pkgs_dirs
pkgs_dirs:
  - /Users/wolfv/micromamba/pkgs
  - /Users/wolfv/.mamba/pkgs
wolfv commented 1 year ago

It should be quite trivial to add this info to the info --json output as well :)

maresb commented 1 year ago

Awesome, thanks for the tip!!! This will be an enormous help.