Open bernt-matthias opened 4 months ago
The diff would break a lot of things, =
is supposed to be an indicator for the build. https://github.com/galaxyproject/galaxy/blob/dev/lib/galaxy/tool_util/deps/mulled/mulled_build_tool.py#L51 looks wrong to me, as it skips the build information. I would say the line should be replaced with target_str_to_targets(package_requirements)
I think the diff is actually the fix .. and should not break anything (in the Galaxy core). I tries to back this up with tests https://github.com/galaxyproject/galaxy/pull/18522
I don't think that's right, I've added this fix in https://github.com/galaxyproject/galaxy/pull/9268. What exactly seems wrong with my suggestion ?
We construct the second part of the hash from the "version". The question is, what is considered part of the version. There are the following possibilities
=
or --
)for the linked problem that you tried to fix (https://github.com/BioContainers/multi-package-containers/pull/953) this is either transit=3.0.2,python=3.7
or transit=3.0.2=py_1,python=3.7
. There you observed a difference between
mulled-v2-98e94080f7dfc7de31b799237ba9a50efe3d7fd8:b1fdad9795a35d257e267de57ccfb6e18d8da4f5-1.tsv
) .. which corresponds to case 1 and mulled-v2-98e94080f7dfc7de31b799237ba9a50efe3d7fd8:d3bc5b63772cbc63141b3e2a57ee38c4b60a055d
) .. which corresponds to case 2.Now the question is which of the two is "correct" (and what does "correct" mean) -- but "correct" is probably not the correct word here. However, my assumption was that the case that the mulled container resolver is looking for is the correct one. If you run planemo test --biocontainers --no_dependency_resolution
on
<tool id="transit" name="transit" version="1" profile="20.01" license="MIT">
<requirements>
<requirement type="package" version="3.0.2=py_1">transit</requirement>
<requirement type="package" version="3.7">python</requirement>
</requirements>
<command>
echo 1
</command>
<inputs>
<param type="integer" name="test"/>
</inputs>
<outputs>
<data name="out" format="txt"/>
</outputs>
<tests>
<test>
<param name="test" value="1"/>
<output name="out">
<assert_contents>
<has_size size="0"/>
</assert_contents>
</output>
</test>
</tests>
</tool>
You will see:
Image name for tool transit: mulled-v2-98e94080f7dfc7de31b799237ba9a50efe3d7fd8:d3bc5b63772cbc63141b3e2a57ee38c4b60a055d
That is, the mulled container resolver (actually the tool parsing that in the end constructs the version attribute of the conda targets) considers the build as part of the version. The problem is that in the multi-package-containers repo mulled-build-files
is used which does not consider the build as part of the version for constructing the image. That is we are constructing images under names that are not found by the resolver
But of course my assumption may well be wrong. Then your suggestion would make mulled-build-tool
behave like mulled-hash
and mulled-build-files
but we would still need to adapt the mulled container resolver and planemo.
I'm not sure which of the two alternatives is better -- but at first I liked the idea that the build info is not considered for constructing the hash. But changing the mulled container resolver seemed more intrusive to me. But in the end we are talking probably only about a few cases where the build info is used.
I don't understand what you're saying here and it's so indirect that I cannot follow. I think it's quite simple in the end, we construct a target, and there are two kinds of build separators that we need to allow. I don't see a reason to change this.
Say we we use your suggestion:
def parse_target(target_str):
if "=" in target_str:
package_name, version = target_str.split("=", 1)
build = None
target = build_target(package_name, version, build)
else:
target = build_target(target_str)
return target
build is always going to be None
, so this is always going to build a CondaTarget without a build
. I fail to see how this would be correct.
We should also not argue about image names, the way mulled build works is it checks if an image needs to be built, and if so it will do that.
build is always going to be None, so this is always going to build a CondaTarget without a build
That's a good point which I have not considered yet, but only the first part of this statement is correct .. the second part is not, since we never remove the build information from the version. I extended the test to show this point.
I think it's quite simple in the end, we construct a target, and there are two kinds of build separators that we need to allow. I don't see a reason to change this.
I would be 100% fine with this if we would consider them at all places of our code. For constructing the v2 hashes we are not consistent at the moment.
We should also not argue about image names, the way mulled build works is it checks if an image needs to be built, and if so it will do that.
I think we definitely have to :) Because it just makes no sense if the building method constructs name A for a set of requirements and the container resolver looks for a name B. I don't care if its A or B -- it just should be the same name. The bottom line of my argument is that at the moment none of the images that were constructed from sets of requirements that contains build information will be used in Galaxy because the the container resolver expects a different name (actually only a different version hash).
The question whether A or B is correct boils down to the question if r1=v1=b1,r2=v2=b2
should have the same image name as r1=v1,r2=v2
. But irrespective of this question we have to decide for one option.
Describe the bug
When calling
mulled-build-tool build superdsm.xml
(tool) the version of the blas requirement1.0=mkl
is taken as is, i.e. the buildmkl
is considered as part of the version string, i.e.CondaTarget[package=blas,version=1.0=mkl]]
When calling
mulled-hash --hash v2 numpy=1.20,cvxpy=1.1.13,scipy=1.6.3,mkl=2020.0,scikit-image=0.18.1,blas=1.0=mkl,giatools=0.1.1,cvxopt=1.2.6,ray-core=1.6.0,superdsm=0.2.0
the version and build information is separated, i.e. the version of blas is considered as1.0
.This leads to different hashes.
I noticed this here: https://github.com/BioContainers/multi-package-containers/blob/master/combinations/mulled-v2-3d3f711fe6839930a8c445eee29baa27b67145bf%3Ae21c53b6a8567d30d6760e0b0b1671eaa51ffe42-0.tsv which builds the container
mulled-v2-3d3f711fe6839930a8c445eee29baa27b67145bf:8ce732fcf4a4b517eab23c0a67dc725b642854c1
When I run the mulled container resolver it is looking for
mulled-v2-3d3f711fe6839930a8c445eee29baa27b67145bf:e21c53b6a8567d30d6760e0b0b1671eaa51ffe42
(which is currently not build).Galaxy Version and/or server at which you observed the bug Galaxy Version: (check/api/version if you don't know)
Commit: (run
git rev-parse HEAD
if you run this Galaxy server)To Reproduce
run
mulled-hash --hash v2 numpy=1.20,cvxpy=1.1.13,scipy=1.6.3,mkl=2020.0,scikit-image=0.18.1,blas=1.0=mkl,giatools=0.1.1,cvxopt=1.2.6,ray-core=1.6.0,superdsm=0.2.0
with and without the following patch and observe that the hashes are the two hashes that are currently used at different places in our infrastructuremulled-v2-3d3f711fe6839930a8c445eee29baa27b67145bf:e21c53b6a8567d30d6760e0b0b1671eaa51ffe42
(expected by the mulled container resolver, used in the filename at multi-package-container repo)mulled-v2-3d3f711fe6839930a8c445eee29baa27b67145bf:8ce732fcf4a4b517eab23c0a67dc725b642854c1
(produced bymulled-build-tool
)Expected behavior
Consistent use of the build for the v2 hash. Not sure which it should be.
I guess the fix will be as shown in the diff, i.e. to make
mulled-hash
consider the build as part of the version. Wanted to write it down first and get comments.