Open bill-baumgartner opened 3 years ago
Sounds good - the timestamp could be the default but users can override.
We need to how metadata should be handled - some may be globally generic to the whole knowledge graph; others may be specific to a given version.
I would rather that there's a sequential default and the timestamp is just metadata...
@bill-baumgartner, I've substantially implemented this Issue recommendation in PR https://github.com/NCATSTranslator/Knowledge_Graph_Exchange_Registry/pull/29; however, that said, @cbizon I've not taken your new recommendation into account, so I'll leave this issue open to be revisited, in order to consider your proposal: are you thinking of a straight sequence or a more complex SemVer specified structure?
Looks good @RichardBruskiewich. Thanks for the quick turnaround. Good question about version tags for KGs. I have been wondering what to use there as well.
Hi @bill-baumgartner, @cbizon, I contacted @newgene this morning about SmartAPI versioning and he proposed the addition of an x-kge
tag to SmartAPI, under which the KGE specific version will be placed, so that solves one concern which is exactly where to post the KGE specific version in our SmartAPI entries. The main API 'version' tag will now be reserved for the gross API version, that is, the KGE template version used (basically 1.0.0 when we go live, analogous to the overall TRAPI versioning).
I don't have any strong opinions about the x-kge.version format. A simple sequence id may be ok, or it could be SemVer structured. The KGE working group can discuss this on Slack #knowledge-graph-exchange-working-group channel or in a future KGE Working Group meeting.
We already have version management of the data essentially sorted out for the Archive itself, but we need to clarify to what extent we publish versions to the Translator SmartAPI Registry.
One can envision several scenarios:
That the Registry only maintains one entry per Knowledge Graph, which is the 'latest' (or last uploaded) version pro: doesn't clutter the Registry with too many KGE entries. This is more in line with typical "latest release" publication of information products. con: users won't immediately know what other versions are available (although visiting the main KGE Archive will expose the list of versions); owners of graphs may have more than one version that they need to officially share at a given time? Users using SmartAPI in an automatic programmatic fashion(?) won't see other available versions (but this is only a limitation if they use the SmartAPI entry without modification... the "real" Archive will allow access to other versions given suitable endpoint query parameterization). See option 4. below for possible workaround?
That the Registry publishes all available versions, one SmartAPI entry per version.
pro: users will see all available versions
con: will will likely clutter up the Registry with too many KGE entries(?), perhaps a source of confusion to users of the entries (unless one entry is x-kge:version
tagged as 'latest'). One suspects that the only difference in KGE SmartAPI entries between versions will generally be the x-kge.version
tag (nothing else may change except possibly submitter and IP metadata?) so such duplication in entries is likely unnecessary.
That submitters be given the choice and control over (selectively) sharing more than one version as a distinct entries in the Registry.
pro: owners of graphs can selectively share than one version that they need to officially share at a given time?
con: could still result in a confusing proliferation of specific KG versions in the repository. It may still be useful to tag an entry as x-kge:version: latest
.
That only a single KGE entry be published to the Registry but that the x-kge.version field somehow documents all available versions, tagging one as the 'latest' (if necessary.. maybe the 'latest' will be obvious, or perhaps, release date stamping of the versions can also be provided here?)
Given that we have full control over the x-kge
metadata fields of the KGE File Set entries to the Registry, option 4 may be the best idea to pursue.
Thoughts anyone?
I agree with your assessment @RichardBruskiewich. I also think option 4 is the way to go.
+1 on option 4 as @RichardBruskiewich suggested.
Based on the comments here (including some consensus around option 4), I created a preliminary SmartAPI validation schema for x-kge.
This is included:
What is still unclear/not set:
@bill-baumgartner highlighted that the Text Mining KP may have frequently incremental versions of their Knowledge Graph. How do we best manage this? Managing KGX "diff" files (tagged as third level indexed semver 'patch' versions?) within one major.minor KGE FileSet folder may be desirable.
Users now specify SemVer versioning of KGE file sets on the web form during file set registration.
Review above comments for further action items (e.g. on the SmartAPI side, and the text mining issue of many incremental versions)
Core metadata (and file set registration) now includes major.minor SemVer versioning of file sets; however, a few outstanding questions remain:
Keep open for later review (for subsequent post-September 2021 Relay iteration of KGE?)
Although the KGE metadata file will contain the KG version, it may be useful to allow the user to specify the KG version directly during the registration process. This version, in turn, could potentially be used as a directory name in the backend storage scheme that was demonstrated during the KGE WG call on 3/31.