NCATSTranslator / Knowledge_Graph_Exchange_Registry

The Biomedical Data Translator Consortium site for development of Knowledge Graph Exchange Standards and Registry
MIT License
5 stars 3 forks source link

Allow user to specify a version for a KG during the registration process #21

Open bill-baumgartner opened 3 years ago

bill-baumgartner commented 3 years ago

Although the KGE metadata file will contain the KG version, it may be useful to allow the user to specify the KG version directly during the registration process. This version, in turn, could potentially be used as a directory name in the backend storage scheme that was demonstrated during the KGE WG call on 3/31.

RichardBruskiewich commented 3 years ago

Sounds good - the timestamp could be the default but users can override.

We need to how metadata should be handled - some may be globally generic to the whole knowledge graph; others may be specific to a given version.

cbizon commented 3 years ago

I would rather that there's a sequential default and the timestamp is just metadata...

RichardBruskiewich commented 3 years ago

@bill-baumgartner, I've substantially implemented this Issue recommendation in PR https://github.com/NCATSTranslator/Knowledge_Graph_Exchange_Registry/pull/29; however, that said, @cbizon I've not taken your new recommendation into account, so I'll leave this issue open to be revisited, in order to consider your proposal: are you thinking of a straight sequence or a more complex SemVer specified structure?

bill-baumgartner commented 3 years ago

Looks good @RichardBruskiewich. Thanks for the quick turnaround. Good question about version tags for KGs. I have been wondering what to use there as well.

RichardBruskiewich commented 3 years ago

Hi @bill-baumgartner, @cbizon, I contacted @newgene this morning about SmartAPI versioning and he proposed the addition of an x-kge tag to SmartAPI, under which the KGE specific version will be placed, so that solves one concern which is exactly where to post the KGE specific version in our SmartAPI entries. The main API 'version' tag will now be reserved for the gross API version, that is, the KGE template version used (basically 1.0.0 when we go live, analogous to the overall TRAPI versioning).

I don't have any strong opinions about the x-kge.version format. A simple sequence id may be ok, or it could be SemVer structured. The KGE working group can discuss this on Slack #knowledge-graph-exchange-working-group channel or in a future KGE Working Group meeting.

We already have version management of the data essentially sorted out for the Archive itself, but we need to clarify to what extent we publish versions to the Translator SmartAPI Registry.

One can envision several scenarios:

  1. That the Registry only maintains one entry per Knowledge Graph, which is the 'latest' (or last uploaded) version pro: doesn't clutter the Registry with too many KGE entries. This is more in line with typical "latest release" publication of information products. con: users won't immediately know what other versions are available (although visiting the main KGE Archive will expose the list of versions); owners of graphs may have more than one version that they need to officially share at a given time? Users using SmartAPI in an automatic programmatic fashion(?) won't see other available versions (but this is only a limitation if they use the SmartAPI entry without modification... the "real" Archive will allow access to other versions given suitable endpoint query parameterization). See option 4. below for possible workaround?

  2. That the Registry publishes all available versions, one SmartAPI entry per version. pro: users will see all available versions con: will will likely clutter up the Registry with too many KGE entries(?), perhaps a source of confusion to users of the entries (unless one entry is x-kge:version tagged as 'latest'). One suspects that the only difference in KGE SmartAPI entries between versions will generally be the x-kge.version tag (nothing else may change except possibly submitter and IP metadata?) so such duplication in entries is likely unnecessary.

  3. That submitters be given the choice and control over (selectively) sharing more than one version as a distinct entries in the Registry.

    pro: owners of graphs can selectively share than one version that they need to officially share at a given time? con: could still result in a confusing proliferation of specific KG versions in the repository. It may still be useful to tag an entry as x-kge:version: latest.

  4. That only a single KGE entry be published to the Registry but that the x-kge.version field somehow documents all available versions, tagging one as the 'latest' (if necessary.. maybe the 'latest' will be obvious, or perhaps, release date stamping of the versions can also be provided here?)

Given that we have full control over the x-kge metadata fields of the KGE File Set entries to the Registry, option 4 may be the best idea to pursue.

Thoughts anyone?

bill-baumgartner commented 3 years ago

I agree with your assessment @RichardBruskiewich. I also think option 4 is the way to go.

newgene commented 3 years ago

+1 on option 4 as @RichardBruskiewich suggested.

colleenXu commented 3 years ago

Based on the comments here (including some consensus around option 4), I created a preliminary SmartAPI validation schema for x-kge.

This is included:

  1. info.x-kge must be present when the kge tag is present, and vice versa
  2. info.x-kge.version is a required field
  3. info.x-kge.version.latest is a required field where the latest version is specified. The other versions can be specified in info.x-kge.version.others.

What is still unclear/not set:

RichardBruskiewich commented 3 years ago

@bill-baumgartner highlighted that the Text Mining KP may have frequently incremental versions of their Knowledge Graph. How do we best manage this? Managing KGX "diff" files (tagged as third level indexed semver 'patch' versions?) within one major.minor KGE FileSet folder may be desirable.

RichardBruskiewich commented 3 years ago

Users now specify SemVer versioning of KGE file sets on the web form during file set registration.

Review above comments for further action items (e.g. on the SmartAPI side, and the text mining issue of many incremental versions)

RichardBruskiewich commented 3 years ago

Core metadata (and file set registration) now includes major.minor SemVer versioning of file sets; however, a few outstanding questions remain:

  1. Are 'patch' semver needed (especially for Text Mining KGX file sets with frequent updates?)
  2. How should KGE file set releases be managed in the Translator SmartAPI Registry

Keep open for later review (for subsequent post-September 2021 Relay iteration of KGE?)