TranslatorSRI / NodeNormalization

Service that produces Translator compliant nodes given a curie
MIT License
9 stars 6 forks source link

Handle Versions #15

Open cbizon opened 4 years ago

cbizon commented 4 years ago

NodeNormalization depends on getting data from somewhere. That data currently comes from Babel. Wherever it comes from, it should be versioned and those versions should be exposed. In some cases, those versions will themselves depend on Biolink versions. It's not 100% clear to me how to manage that chain of versions.

gaurav commented 2 years ago

For Babel in general, we currently have four levels of possible versioning going on:

If the above is good enough, I think we can return the Babel version number (e.g. 2022sep6) with every NodeNorm request so users know where their data came from.

If, however, we want to maintain provenance at the clique level, this will require changing how we generate the glom files so that provenance can be tracked and included. However, the proposal above re: recording source information for each datahandler would still be useful in (1) giving us an overall picture of what's included in a particular Babel release, and (2) producing the source versioning information we would need to actually generate that provenance.

cbizon commented 2 years ago

I don't think that we need per-clique provenance.

I'm not sure that just the Babel version number is enough though, because the same version of the code could be used multiple times and pull different data sets as you note.

So I think that there's a version (could be a date, could just be a number) of the overall collection. The babel version is associated with that overall version, along with the versions of all the inputs. I could imagine wanting versions of the individual compendia files themselves sort of how individual chromosome assemblies have a version and then the collection also has a version so that you can know that the next version of the collection is the same as the past one, but with a new compendium.