bio-tools / biotoolsRegistry

biotoolsregistry : discovery portal for bioinformatics
GNU General Public License v3.0
70 stars 20 forks source link

Disallow registration of entries for new versions of a tool but with trivial differences only #160

Open joncison opened 7 years ago

joncison commented 7 years ago

The API and GUI should discourage or (ideally) disallow registration of new tool versions with trivial differences only. The intention was for bio.tools to include tool versions with major differences from an end-user / usage perspective not merely record the fact that the version number has changed (for whatever possibly trivial reason).

It boils down to settling what constitutes a valid new version. I'd suggest:

Ignore EDAM formats for now.

Once settled, the above should be captured:

bug1303 commented 7 years ago

I see some problems in regard to maintenance here (again depending on what bio.tools should and should not be...):

Precise version numbers are required to ensure reproducibility of any bioinformatics project.

If bio.tools actually becomes the authority in handling IDs that can then be referenced in publications as well, and that includes a version number, those should not suddenly disappear. At the same time developers would want to keep their version numbers updated.

With the perspective of integrating with bioconda, one of the key features (the selling point for me really) are environments where you can ensure specific versions of certain packages to be installed and share those with your co-workers.

Of course having all kind of minor versions, creates a lot of redundant information. Ideally, upon a search in bio.tools, I would only see the latest version in the results page. (Or any major version before when there have been significant changes as described above.) But it should still be possible to reference a certain version.

Just some examples, why a version number might change...

joncison commented 7 years ago

Thanks and I agree, the version information is crucial for all reasons given

The current position (from discussions with @ekry et al) is that we should associate version information with (at least) a publication ID and (maybe) other fields, e.g. download, such that it is at least clear what version number is associated with what publication and downloads such as binary or source package.

In this scenario, we'd retain versionIDs, but these would not be part of the tool URL; a profound change, i.e. abandoning (for now at least) our ambition of providing: htpps://bio.tools/toolID/versionID

and simply supporting: htpps://bio.tools/toolID/

providing a unique and (once clean-up of toolIDs is complete, cc @hans) persistent reference to the tool. This we're thinking is a more realistic aim, at least in 1st instance, given the available resources.

This is critical issues so discussion is good here.

joncison commented 6 years ago

Just a note cc @bug1303, in biotoolsSchema 3.0.0 (supported in the next release of bio.tools) you can assign version information to a publication, download and otherID of a tool. The entry itself can also receive version information in a flexible way. But this version information isn't (and won't) be baked into the tool identifiers themselves (see https://biotools.readthedocs.io/en/latest/what_is_biotools.html#bio-tools-tool-identifiers)

The version number is a precise thing - provided by the tool developer, but distinct from the tool identifier (provided by bio.tools, based on supplied tool names).

What constitutes a unique version of a tool (and what's registered) is thus down to the provider (the entries being subject to bio.tools admin curation) - we're aiming for bio.tools records that capture major functional differences (not all versions of a tool have such differences). WIth more time and resources, we could go further. But for now, I close the issue (feel free to re-open and comment!).

cc @hansioan

joncison commented 5 years ago

see to what extent biotoolsLint can detect suspected duplicate entries