galaxy-iuc / standards

Documentation for standards and best practices from the Galaxy IUC
http://galaxy-iuc-standards.readthedocs.io/en/latest/
6 stars 16 forks source link

Tool versions vs wrapper versions #51

Closed Slugger70 closed 6 years ago

Slugger70 commented 6 years ago

After some discussion with @bgruening, we would like to talk about a versioning system for galaxy tool wrappers which includes a combination of information about the underlying tool version and the wrapper version.

The idea is that the version contained within the <tool id="tool_id" name="Tool Name" version="x.x.x"> block be formatted in such a way as to be informative of the underlying tool version as well as the version of the wrapper for that tool.

At the moment, it seems to be one or the other.. Some people use the underlying tool version and some a wrapper version.

If we could have two tokens in macros.xml, one for tool version and one for wrapper version. For example:

<token name="@TOOL_VERSION@">0.8</token>
<token name="@WRAPPER_VERSION@">1</token>

Then they are referred to in the <tool> tag as version="@TOOL_VERSION@.@WRAPPER_VERSION@" which for the example would give 0.8.1.

Then if you merely update the wrapper without updating the tool, the @WRAPPER_VERSION@ increases by 1 similar to build number in bioconda, but if you update the underlying tool, then the wrapper version resets back to 1.

Of course, having a . as the version delimiter might become confusing, and also what to do for wrappers that have multiple tool requirements? etc.

bgruening commented 6 years ago

The idea behind @WRAPPER_VERSION@ is a little bit the same as build number for conda. In most of our tools we are already following this approach, but I think it would be nice to put this into variables. We do have some plans to update tools automatically and keep @TOOL_VERSION@ in sync with conda. @WRAPPER_VERSION@ will then be reset to 0.

martenson commented 6 years ago

Sidenote: Recently we were solving some versioning problems, it is good to know that Galaxy only correctly orders versions that follow PEP440

peterjc commented 6 years ago

I'm not keen on this exact proposal, but agree the current siutation is a mess.

Many underlying tools are inconsistent about the number of components of their dot separated versions, which makes breaking down the compound version even harder.

On an IUC tool the version of the underlying tool(s) is already explicitly recorded in the XML under the requirements block. Would a better idea be to enhance Galaxy to use this information and report the underlying tool version(s) more prominently in the UI?

martenson commented 6 years ago

Would a better idea be to enhance Galaxy to use this information and report the underlying tool version(s) more prominently in the UI?

The last time I remember this proposal did not make it through, as being too distracting and not interesting enough to majority of users. But at least we made the requirements contents available on the tool's menu.

bgruening commented 6 years ago

@peterjc afaik this proposal does not touch the user-facing side. It just clarifies the variable names for the devoloper. We would like to standardize the variable names to be able to increment them automatically.

On an IUC tool the version of the underlying tool(s) is already explicitly recorded in the XML

This one will also use the @TOOL_VERSION@ token.

peterjc commented 6 years ago

@martenson on the UI that "Options" menu seems mislabelled... would it be tricky to move the dependencies to the "Versions" download instead?

peterjc commented 6 years ago

@bgruening if you and @Slugger70 are suggesting this:

<tool id="tool_id" name="Tool Name" version="@TOOL_VERSION@.@WRAPPER_VERSION@" ...>

Then it is user facing, and it will be confusing.

bgruening commented 6 years ago

Mh, this is what we do in many tools already, e.g. https://github.com/galaxyproject/tools-iuc/blob/master/tools/bedtools/bamToFastq.xml#L1. It is more a technical issue that a tool needs to have a version. We can happily change the display of that or hide the wrapper version, but we need to have a version=.

If it is a . or a _ is imho irrelevant at least for the purpose of automating tool updates. For this it would be nice to agree on a variable name(s) that we use in our macros or elsewhere.

Slugger70 commented 6 years ago

I agree that <tool id="tool_id" name="Tool Name" version="@TOOL_VERSION@.@WRAPPER_VERSION@" ...> could be confusing, and that the . may not be a good delimiter.

If we can explain to the users the versioning system (tool vs wrapper) then it may not be such a problem.

Slugger70 commented 6 years ago

You're right @peterjc, the current system is a bit of a mess.

I think most users seem to think the wrapper is the tool and that the wrapper version is the underlying tool version.. I was told by a user the other day that the tool on Galaxy is really old as it's still on version 1 (the wrapper version) when the tool is at version 3.5 (which is in the xml..)

peterjc commented 6 years ago

So from the UI point of view, what we want to prioritise in the UI is the underlying tool version - with the tool wrapper version to be downplayed.

mvdbeek commented 6 years ago

Yeah, this is tricky. I guess maybe version should be re-baptised to wrapper_version, and the user-facing version selection should feature the underlying requirements. This is now a lot easier with tools usually just having 1 or 2 requirements. But that would be complicated to pull off :/

martenson commented 6 years ago

@bgruening

this is what we do in many tools already, e.g. https://github.com/galaxyproject/tools-iuc/blob/master/tools/bedtools/bamToFastq.xml#L1

This approach is what caused the vcflib tools being wrongly ordered by Galaxy.

peterjc commented 6 years ago

@mvdbeek Yes, with a special case fall back when there are zero declared requirements (e.g. self contained tool, not actually a wrapper)

[Typo fixed; 9 and 0 are next to each other]

martenson commented 6 years ago

What would the approach be when the wrapper version is already 'ahead' of the underlying tool? (e.g. samtools_idxstats has wrapper 2.0.1 but uses samtools 1.3.1)

peterjc commented 6 years ago

From a sorting point of view 1.3.1.2.0.1 makes sense in that example

Slugger70 commented 6 years ago

Our whole idea behind this was to easily automatically update wrappers for testing whenever there is a change in bioconda versions. We could monitor bioconda for changes in tool versions, then auto-update wrappers that use that tool and previous version and submit them to testing. The ones that pass - great - push them to the toolshed. The ones that fail - flag for manual cleanup, test improvement etc.

martenson commented 6 years ago

@peterjc

>>> packaging.version.parse('1.3.1.2.0.1') < packaging.version.parse('2.0.1')
True
peterjc commented 6 years ago

Ah - I see what you mean, a mix of the old and the proposed numbering would just break in that case. Sigh.

bgruening commented 6 years ago

Old tools that do not follow this schema already can not be touched. In this case we do not share a variable between the requirement and the tool-version. This is currently true and can not be changed.

IUC recommendation should be used for new tools, but they do not imply that we can use them for old tools. For example see our rules for the tool_id.

peterjc commented 6 years ago

OK, if this is only done for new tools, and the wrapper number is a single integer (like the build number in conda recipes), then its growing on me.

Continuing that analogy, I think conda is using a minus sign as the separator for the build number suffix. We'd like to be PEP440 compliant and I think this would be OK?

peterjc commented 6 years ago

@martenson likes the minus based suffix idea: https://github.com/galaxyproject/tools-iuc/pull/1784#issuecomment-375379721

martenson commented 6 years ago

@peterjc I do like your idea, packaging.version parses it fine too.

mvdbeek commented 6 years ago

I guess this is the best we can do if we want to carry wrapper and tool version in the same entity, but this will cause problems for dependencies that also specify a minus-based suffix, right ?

martenson commented 6 years ago

@mvdbeek do you mean example like this? (This looks fine)

>>> packaging.version.parse('1.0.0-rc1')
<Version('1.0.0rc1')>
>>> packaging.version.parse('1.0.0-rc1-0')
<Version('1.0.0rc1.post0')>
mvdbeek commented 6 years ago

Well, can you tell me for for 1.0.0-rc1-0 if we added it or upstream released this ? And if upstream released this, we'll get a LegacyVersion:

In [2]: packaging.version.parse('1.0.0-rc1-0-0')
Out[2]: <LegacyVersion('1.0.0-rc1-0-0')>
mvdbeek commented 6 years ago

(Just pointing it out, I don't have a better suggestion)

martenson commented 6 years ago

@mvdbeek I cannot, but do we care?

mvdbeek commented 6 years ago

Yes, if upstream releases '1.0.0-rc1-0' users will see this and think they're using 1.0.0-rc1-0, when they're actually still using 1.0.0-rc1.

martenson commented 6 years ago

@mvdbeek I see your edit now. According to our tool version data from yesterday these shouldn't be too frequent. I think it is a good standard choice.

mblue9 commented 6 years ago

Knowing next to nothing about version schemes or PEP440 but having often felt concerned about the potential for confusion on the point @mvdbeek raised above:https://github.com/galaxy-iuc/standards/issues/51#issuecomment-375395467, as that would confuse me. I've often wished it was much clearer what's a Galaxy version vs tool version and had wondered if there could a really obvious "g" to indicate Galaxy wrapper version?

I see these local_version_identifiers mentioned here: https://www.python.org/dev/peps/pep-0440/#adding-local-version-identifiers.

So could you have e.g. tool_version+g1

and is this point from that PEP440 section relevant here?

The plus is chosen primarily for readability of local version identifiers. It was chosen instead of the hyphen to prevent pkg_resources.parse_version from parsing it as a prerelease, which is important for enabling a successful migration to the new, more structured, versioning scheme.

but like I said I don't know much about this at all so would defer to all you more knowledgeable people above 😄

peterjc commented 6 years ago

I was also wondering about this overnight, and if we could use w1, w2 etc as wrapper version suffices. The g1 suffix from @mblue9 is much the same, but I think they are right that the PEP440 local version identifier scheme would work here.

This is using a plus sign - see https://www.python.org/dev/peps/pep-0440/#local-version-identifiers

So for a tool version 1.0.0-rc1 we could append +w1 or +g1 giving 1.0.0-rc1+g1 which seems clearer than using the -1 suffix.

hexylena commented 6 years ago

Always was fond of the Ubuntu x.y.z+ubuntuN as being much more explicit about what was the upstream revision vs the version being packaged for your system. +1 for +galaxy (the single g is a bit too concise/opaque for my person preferences but would be better than the current situation)

peterjc commented 6 years ago

If there are no UI concerns about the length, then I like adding +galaxy0, +galaxy1, +galaxy2, ... with a integer wrapper version like a build number (reset to zero for a tool version bump).

mvdbeek commented 6 years ago

If that's the consensus we could separate out the galaxy-specific suffix and display galaxy wrapper version and underlying tool version as separate items in the UI.

peterjc commented 6 years ago

Yes, good point. Not only is ${TOOL}+galaxy${WRAPPER} easier for humans to parse, it is unambiguous enough that the UI could parse it automatically.

martenson commented 6 years ago

If that's the consensus we could separate out the galaxy-specific suffix and display galaxy wrapper version and underlying tool version as separate items in the UI.

This has been discussed in the past and I lost when defending it. (see above "being too distracting and not interesting enough to majority of users"). That said we can try push it again. We should find a good way and place for representing this info before we a open a PR though.

Also what to do when a tool relies on multiple SW packages with separate versions?

nsoranzo commented 6 years ago

${TOOL}+galaxy${WRAPPER} is a good standard that we should suggest as IUC and in the Galaxy tool schema documentation, but it is self-explaining enough that I don't think we need to special-case the UI for it (especially now that is not used much if at all).

Also what to do when a tool relies on multiple SW packages with separate versions?

In this case, either there's a main identifying package we can use for the version, or we use a wrapper-specific version (without the +galaxyN part).

martenson commented 6 years ago

first tool to follow this standard is vcflib: https://github.com/galaxyproject/tools-iuc/pull/1784