Closed dustine32 closed 11 months ago
I think there is something odd about allowing IDs to just sit in a global IDs space uncontrolled and using unstructured pseudo-English text. For internally generated IDs, we have an algorithm to guarantee non-colliding names across multiple minervas, but the imports is something we've not really talked about recently.
I believe that either a non-colliding algorithm or UUID should be used, or there should be some other ruleset applied so that any group could contribute without having to cross-check their IDs across all of our current IDs. As well, I think there should be a rule for compactness, as "GLUCOSE-MANNOSYL-CHITO-DOLICHOL-GLUCOSE-MANNOSYL-CHITO-DOLICHOL.ttl" isn't great.
Functionally, not using the algorithm or UUID would mean some kind of light namespacing, like Reactome (not necessarily a resource name echo "FOO-PWY" | md5sum | cut -f 1 -d ' ' | awk '{print "YP-" $0}'
).
From 2022-08-16 Alliance pathways call, we decided to prepend the YeastCyc
prefix to the model ID and filename for YeastPathways. We will leave the Reactome ID/filename code alone (Ex: model ID=R-HSA-350562
, filename=R-HSA-350562.ttl
).
@suzialeksander @vanaukenk For the YeastCyc
prefix, just confirming with you: @kltm and I would like to use the YeastCyc-
prefix (containing a hyphen) rather than YeastCyc_
(with an underscore) in model IDs/filenames (ex: model ID=YeastCyc-SO4ASSIM-PWY
, filename=YeastCyc-SO4ASSIM-PWY.ttl
). Is this OK with you?
Fixed by #251.
OK, for the YeastPathways import, our first choice would simply be SGD
, so gomodel:SGD-SERSYN-PWY
. Rather close second choice is YeastPathways
, making it gomodel:YeastPathways-SERSYN-PWY
.
Current name of 'YeastCyc` is not correct.
Thanks @dustine32
Thanks @suzialeksander! Anticipating the import of SGD standard annotations into Noctua, the existing gene-centric model ID convention is to use the MOD gene product ID, e.g., WB_WBGene00077700
, MGI_MGI_99187
, ZFIN_ZDB-GENE-020424-3
. So, as long as there are no SGD gene product IDs that would conflict with the YeastPathway IDs, I think changing the YeastCyc-
part of the new YeastPathways model IDs (and filenames) to SGD-
. And I believe the YeastPathways IDs (220 of them) are already static so it should be possible to be confident about this.
@suzialeksander Can you confirm this conflict will not (or is unlikely to) occur and I can go ahead and change to SGD-
prefix?
Oh crud. I just realized someone may not like the casual alternating of hyphen -
and underscore _
. @kltm
@dustine32 Yeah, I'm not wild about this, but I think at this point we have so much "variety" that it may not be worth trying to get the horse back in the stable. It would be good to work out a universal ruleset for different kinds of imports moving forward.
Correct, none of these 220 should cause issues with SGD: gomodel:SGD-SERSYN-PWY
or gomodel:SGD_SERSYN-PWY
, whichever keeps the horse happy. SGDIDs should all be something like SGD:S000001855
Decision: gomodel:YeastPathways_SERSYN-PWY
@dustine32
@kltm, this ticket was originally about defining rules. I don't think we've defined any rules, simply come up with a solution for this particular import. Do you want this ticket to be moved somewhere/kept open for discussion?
else OK to close.
I'll promise with @dustine32 that we won't forget. That said https://github.com/geneontology/pathways2GO/issues/189#issuecomment-1546238034 is the template: "pseudo-namespace
_not-too-long-and-ideally-unique-id
. The details can be worked out as we go as part of SOP.
We should discuss rules for model ID minting and file naming in the BioPAX pathway import process. Currently our two scenarios:
{pathway ID}
e.g.R-HSA-350562
{pathway ID}.ttl
e.g.R-HSA-350562.ttl
{pathway ID}
e.g.SO4ASSIM-PWY
{input filename}-{pathway ID}.ttl
e.g.SO4ASSIM-PWY-SO4ASSIM-PWY.ttl
Note: this redundant naming bug will be fixed in https://github.com/geneontology/pathways2GO/issues/188Should we add a prefix to these model IDs to help prevent ID collisions for data coming from multiple sources? (From @kltm: "so that any group could contribute without having to cross-check their IDs across all of our current IDs") For example, another MOD could at some point import their own pathway model for assimilatory sulfate reduction I using the same ID
SO4ASSIM-PWY
as YeastCyc. It would also aid in model file management: All YeastCyc models aremodels/YeastCyc_*
.A quick suggestion would be to use the full Pathway Xref ID from the BioPAX to supply the prefix:
Would result in model ID =
Reactome_R-HSA-350562
, filename =Reactome_R-HSA-350562.ttl
.Would result in model ID =
YeastCyc_SO4ASSIM-PWY
, filename =YeastCyc_SO4ASSIM-PWY.ttl
.Tagging @deustp01 @ukemi @vanaukenk @cmungall @kltm