Galaxy handles version+build inconsistently which leads to inconsistent mulled v2 hashes

bernt-matthias commented 4 months ago

Describe the bug

When calling mulled-build-tool build superdsm.xml (tool) the version of the blas requirement 1.0=mkl is taken as is, i.e. the build mkl is considered as part of the version string, i.e. CondaTarget[package=blas,version=1.0=mkl]]

When calling mulled-hash --hash v2 numpy=1.20,cvxpy=1.1.13,scipy=1.6.3,mkl=2020.0,scikit-image=0.18.1,blas=1.0=mkl,giatools=0.1.1,cvxopt=1.2.6,ray-core=1.6.0,superdsm=0.2.0 the version and build information is separated, i.e. the version of blas is considered as 1.0.

This leads to different hashes.

I noticed this here: https://github.com/BioContainers/multi-package-containers/blob/master/combinations/mulled-v2-3d3f711fe6839930a8c445eee29baa27b67145bf%3Ae21c53b6a8567d30d6760e0b0b1671eaa51ffe42-0.tsv which builds the container mulled-v2-3d3f711fe6839930a8c445eee29baa27b67145bf:8ce732fcf4a4b517eab23c0a67dc725b642854c1

When I run the mulled container resolver it is looking for mulled-v2-3d3f711fe6839930a8c445eee29baa27b67145bf:e21c53b6a8567d30d6760e0b0b1671eaa51ffe42 (which is currently not build).

Galaxy Version and/or server at which you observed the bug Galaxy Version: (check /api/version if you don't know) Commit: (run git rev-parse HEAD if you run this Galaxy server)

To Reproduce

run mulled-hash --hash v2 numpy=1.20,cvxpy=1.1.13,scipy=1.6.3,mkl=2020.0,scikit-image=0.18.1,blas=1.0=mkl,giatools=0.1.1,cvxopt=1.2.6,ray-core=1.6.0,superdsm=0.2.0 with and without the following patch and observe that the hashes are the two hashes that are currently used at different places in our infrastructure

mulled-v2-3d3f711fe6839930a8c445eee29baa27b67145bf:e21c53b6a8567d30d6760e0b0b1671eaa51ffe42 (expected by the mulled container resolver, used in the filename at multi-package-container repo)
mulled-v2-3d3f711fe6839930a8c445eee29baa27b67145bf:8ce732fcf4a4b517eab23c0a67dc725b642854c1 (produced by mulled-build-tool)

--- a/lib/galaxy/tool_util/deps/mulled/mulled_build.py
+++ b/lib/galaxy/tool_util/deps/mulled/mulled_build.py
@@ -532,10 +532,10 @@ def target_str_to_targets(targets_raw):
         if "=" in target_str:
             package_name, version = target_str.split("=", 1)
             build = None
-            if "=" in version:
-                version, build = version.split("=")
-            elif "--" in version:
-                version, build = version.split("--")
+            # if "=" in version:
+            #     version, build = version.split("=")
+            # elif "--" in version:
+            #     version, build = version.split("--")
             target = build_target(package_name, version, build)
         else:
             target = build_target(target_str)

Expected behavior

Consistent use of the build for the v2 hash. Not sure which it should be.

I guess the fix will be as shown in the diff, i.e. to make mulled-hash consider the build as part of the version. Wanted to write it down first and get comments.

mvdbeek commented 4 months ago

The diff would break a lot of things, = is supposed to be an indicator for the build. https://github.com/galaxyproject/galaxy/blob/dev/lib/galaxy/tool_util/deps/mulled/mulled_build_tool.py#L51 looks wrong to me, as it skips the build information. I would say the line should be replaced with target_str_to_targets(package_requirements)

bernt-matthias commented 4 months ago

I think the diff is actually the fix .. and should not break anything (in the Galaxy core). I tries to back this up with tests https://github.com/galaxyproject/galaxy/pull/18522

mvdbeek commented 4 months ago

I don't think that's right, I've added this fix in https://github.com/galaxyproject/galaxy/pull/9268. What exactly seems wrong with my suggestion ?

bernt-matthias commented 4 months ago

We construct the second part of the hash from the "version". The question is, what is considered part of the version. There are the following possibilities

version (i.e. ignoring everything after = or --)
version+build (i.e. keeping everything)

for the linked problem that you tried to fix (https://github.com/BioContainers/multi-package-containers/pull/953) this is either transit=3.0.2,python=3.7 or transit=3.0.2=py_1,python=3.7. There you observed a difference between

the hash in the filename (mulled-v2-98e94080f7dfc7de31b799237ba9a50efe3d7fd8:b1fdad9795a35d257e267de57ccfb6e18d8da4f5-1.tsv) .. which corresponds to case 1 and
the image name that was pushed (mulled-v2-98e94080f7dfc7de31b799237ba9a50efe3d7fd8:d3bc5b63772cbc63141b3e2a57ee38c4b60a055d) .. which corresponds to case 2.

Now the question is which of the two is "correct" (and what does "correct" mean) -- but "correct" is probably not the correct word here. However, my assumption was that the case that the mulled container resolver is looking for is the correct one. If you run planemo test --biocontainers --no_dependency_resolution on

<tool id="transit" name="transit" version="1" profile="20.01" license="MIT">
    <requirements>
        <requirement type="package" version="3.0.2=py_1">transit</requirement>
        <requirement type="package" version="3.7">python</requirement>
    </requirements>
    <command>
        echo 1
    </command>
    <inputs>
        <param type="integer" name="test"/>
    </inputs>
    <outputs>
        <data name="out" format="txt"/>
    </outputs>
    <tests>
        <test>
            <param name="test" value="1"/>
            <output name="out">
                <assert_contents>
                    <has_size size="0"/>
                </assert_contents>
            </output>
        </test>
    </tests>
</tool>

You will see:

Image name for tool transit: mulled-v2-98e94080f7dfc7de31b799237ba9a50efe3d7fd8:d3bc5b63772cbc63141b3e2a57ee38c4b60a055d

That is, the mulled container resolver (actually the tool parsing that in the end constructs the version attribute of the conda targets) considers the build as part of the version. The problem is that in the multi-package-containers repo mulled-build-files is used which does not consider the build as part of the version for constructing the image. That is we are constructing images under names that are not found by the resolver

But of course my assumption may well be wrong. Then your suggestion would make mulled-build-tool behave like mulled-hash and mulled-build-files but we would still need to adapt the mulled container resolver and planemo.

I'm not sure which of the two alternatives is better -- but at first I liked the idea that the build info is not considered for constructing the hash. But changing the mulled container resolver seemed more intrusive to me. But in the end we are talking probably only about a few cases where the build info is used.

mvdbeek commented 4 months ago

I don't understand what you're saying here and it's so indirect that I cannot follow. I think it's quite simple in the end, we construct a target, and there are two kinds of build separators that we need to allow. I don't see a reason to change this.

Say we we use your suggestion:

    def parse_target(target_str):
        if "=" in target_str:
            package_name, version = target_str.split("=", 1)
            build = None
            target = build_target(package_name, version, build)
        else:
            target = build_target(target_str)
        return target

build is always going to be None, so this is always going to build a CondaTarget without a build. I fail to see how this would be correct.

We should also not argue about image names, the way mulled build works is it checks if an image needs to be built, and if so it will do that.

bernt-matthias commented 4 months ago

build is always going to be None, so this is always going to build a CondaTarget without a build

That's a good point which I have not considered yet, but only the first part of this statement is correct .. the second part is not, since we never remove the build information from the version. I extended the test to show this point.

I think it's quite simple in the end, we construct a target, and there are two kinds of build separators that we need to allow. I don't see a reason to change this.

I would be 100% fine with this if we would consider them at all places of our code. For constructing the v2 hashes we are not consistent at the moment.

We should also not argue about image names, the way mulled build works is it checks if an image needs to be built, and if so it will do that.

I think we definitely have to :) Because it just makes no sense if the building method constructs name A for a set of requirements and the container resolver looks for a name B. I don't care if its A or B -- it just should be the same name. The bottom line of my argument is that at the moment none of the images that were constructed from sets of requirements that contains build information will be used in Galaxy because the the container resolver expects a different name (actually only a different version hash).

The question whether A or B is correct boils down to the question if r1=v1=b1,r2=v2=b2 should have the same image name as r1=v1,r2=v2. But irrespective of this question we have to decide for one option.

galaxyproject / galaxy

Galaxy handles version+build inconsistently which leads to inconsistent mulled v2 hashes #18513