ga4gh / TASC

TASC aids the harmonisation of aspects of GA4GH's various products that would otherwise prevent different products from being used together conveniently.
https://www.ga4gh.org
8 stars 7 forks source link

Ensure old versions of GA4GH standards are permanently available #19

Closed jb-adams closed 2 months ago

jb-adams commented 4 years ago

Problem Statement

As new features are added to existing GA4GH specifications, new versions of these standards will be released. Even when a new specification version is released, not all implementing organizations will want to or be able to make this jump right away. However, they must still be able to access older versions that they’ve implemented or plan to implement. Thus, older versions must still be accessible even when subsequent versions are released. Older versions should not be overwritten by later versions.

GA4GH specification repositories should maintain a complete history/provenance of all released versions. Each version should be accessible by a permanent, unique, unambiguous url.

Impact on Alignment

This issue aims to harmonize the process by which various work streams and groups make different versions of their standards accessible.

Landscape Analysis

The OpenAPI-Specification Github repository provides a good model for maintaining a permanent listing of multiple versions of the OpenAPI spec. The versions/ folder contains a distinct specification markdown file for each released version.

The hts-specs repository also maintains a permanent listing of multiple versions for file format specs.

Proposed Solution

To allow for permanently accessible historical specification versions, spec documents (OpenAPI .yaml files, markdown docs, built html pages, etc.) should be named according to their version number. When a new release is being drafted, the new version should be written to a separate file, not overwriting previous versions. If the specification is available in multiple formats, (ie. a build tool builds html pages from the original .yaml), all formats should be available for all spec versions.

As the number of standards and versions continues to expand, the GA4GH software team can maintain a central registry, making the urls pointing to every version of every standard accessible from a single source/API.

jmarshall commented 4 years ago

It is not accurate to say that the hts-specs repository maintains a permanent listing of multiple versions of its specifications. This varies amongst the groups sharing this repository:

OTOH all historical editions of all hts-specs documents are of course available via the Git history in the repository.

jmarshall commented 4 years ago

To the extent that the proposed solution is about telling working groups how to organise their work, is that not off topic for TASC?

TASC will not work on projects that:

  • […]
  • [are] too imposing on how teams manage their operations
jb-adams commented 4 years ago

It is not accurate to say that the hts-specs repository maintains a permanent listing of multiple versions of its specifications. This varies amongst the groups sharing this repository:

  • Refget, htsget, and crypt4gh each maintain only one current document
  • SAM maintains only documents for the current version, and these documents include appendices indicating which facilities were not present in earlier SAM versions or have otherwise changed since earlier versions
  • VCF and (to a lesser extent) CRAM maintain documents for each extant major VCF/CRAM version — this has pros (supposed spec for a particular version is available) and cons (clarifications and editorial changes must be applied to multiple documents, unclear to users what the difference between different versions is), cf samtools/hts-specs#32

OTOH all historical editions of all hts-specs documents are of course available via the Git history in the repository.

In my original comment, I did mention that it was the file format based specs within hts-specs, (I can see multiple versions for CRAM, VCF, BCF), I was not including htsget or refget in that assessment.

To the extent that the proposed solution is about telling working groups how to organise their work, is that not off topic for TASC?

TASC will not work on projects that:

  • […]
  • [are] too imposing on how teams manage their operations

One major reason this will be important is wrt service-info. The type attribute contains a version, indicating the version of the spec. If a client finds an htsget 1.2.0 service registered within a service registry, the version number will lose its meaning if this iteration of the spec is unavailable because the spec tracked in Github has moved to 2.0.0. If an implementer doesn't have the resources to make the jump from 1.2.0 to 2.0.0, the 1.2.0 version document should still be available somehow so they can implement it comprehensively without worrying that the document will disappear. To say that the user/developer/researcher can look up old versions in Github commit history is not really tenable.

By this issue, the aim is not to impose exactly how this is accomplished, but rather to highlight that it should be a goal to ensure the longevity of our standards, considering not every implementing group will be able to adopt the leading edge.

mamanambiya commented 3 years ago

@jb-adams has this been added to the Product Approval process?

jb-adams commented 3 years ago

Thanks @mamanambiya

@susanfairley this is an additional point to consider when developing the new product approval process. In a nutshell, we want to ensure that as work streams develop new versions of existing specs, that the older versions remain permanently available. This is so that an implementer always has a reference point to the spec they've developed something off of, especially considering that not all groups will always be able to keep up with the latest iteration of the spec.

susanfairley commented 3 years ago

This also sounds relevant to the website and making the specs easy for people to find as well as product approval.

jb-adams commented 3 years ago

This also sounds relevant to the website and making the specs easy for people to find as well as product approval.

My thoughts exactly, if we have current and historical versions of GA4GH specs readily available it will be easy to reference this information via the new website that will have landing pages for each standard

mamanambiya commented 3 years ago

After checking, I see that @rishidev proposed some changes in section Previous Version Availability of the GA4GH Product Approval Submission Form. This looks good to me. Can we please have some feedback from others to proceed with closing this issue?

susanfairley commented 3 years ago

This was discussed on the TASC call this week. Noting that there is some complexity here but that being able to link to previous versions of specifications would likely be useful for the website and that this should/will be kept in mind during website and product approval work.