Open jvwong opened 2 years ago
b. For non-human pathways, the participants (i.e. proteins) all seem to possess ProteinReferences that reference a RelationshipXref, but never a UnificationXref.
@jvwong looks like in the unstable
branch I made some updates to use UnificationXref
for ggp
entities and RelationshipXref
for the other ones. These updates have not been merged to the master
yet so in the instance that we used in build all of them was RelationshipXref
.
Is it enough to use UnificationXref
for only ggp
entities as it is done in unstable
branch now? If not, when to use RelationshipXref
and when to use UnificationXref
?
@jvwong looks like in the
unstable
branch I made some updates to useUnificationXref
forggp
entities andRelationshipXref
for the other ones. These updates have not been merged to themaster
yet so in the instance that we used in build all of them wasRelationshipXref
.
Sounds like its worth a try. Is there any reason why the human pathways don't seem to have this problem (in the current instance/master), that is, they are mapping to UnificationXref correctly?
Sounds like its worth a try. Is there any reason why the human pathways don't seem to have this problem (in the current instance/master), that is, they are mapping to UnificationXref correctly?
@jvwong I think what happening in the master branch is that:
RelationshipXref
is assigned to any entity referenceBioSource
class of Paxtools) is associated with EntityReference
. The organisms are associated with a UnificationXref
. See (https://github.com/PathwayCommons/factoid-converters/blob/master/src/main/java/factoid/model/BioPAXModel.java#L222 and https://github.com/PathwayCommons/factoid-converters/blob/master/src/main/java/factoid/model/BioPAXModel.java#L253)I wonder if the UnificationXrefs
that you mention are the ones assigned to the organisms?
OK let me know if you have a chance to rebuild the beta. I think this is the only issue required for a v13 release.
Did we fix the biofactoid metadata file (e.g. add logo and pubmed ID)?
Did we fix the biofactoid metadata file (e.g. add logo and pubmed ID)?
I posted this one: https://github.com/PathwayCommons/cpath2/issues/313
I updated the "Factoid binary interactions" Google Doc with some items on how to assign Xrefs (and subclasses), some of which is below:
Biofactoid helps assign external public database identifiers to molecular interaction participants (except for Complex) from ChEBI or NCBI Gene. This is via our grounding-search application.
For small molecules, it is reasonable to assign a UnificationXref to an entity reference (ChEBI).
For genes and their products, it is more appropriate to assign a RelationshipXref for the simple reason that physical entity types (RNA, PROTEIN) merely reference an underlying gene (locus), but are not identified by it per se. An exception could be made for ‘DNA’, as in these cases, the thing being referred to can be either a pseudogene locus or transposon locus. When it is possible to map an NCBI Gene record to UniProt, it can be deemed appropriate to assign a UnificationXref for two reasons: 1) UniProt folds (similar) alternative protein sequences from the same locus under the canonical sequence record 2) We are effectively assigning NCBI Gene records for the user through the grounding search top hit. These statements are summarized below:
Table: BioPAX Xref subtypes for Biofactoid interaction participants
ENTITY_TYPE | DATABASE ID | NCBI_GENE_TYPE | Xref subclass |
Chemical | ChEBI | n/a | UnificationXref |
GGP | NCBI Gene | ‘unknown’; ‘biological region’; ‘other’ | RelationshipXref |
DNA | NCBI Gene | 'pseudo', 'transposon' | UnificationXref |
RNA | NCBI Gene | 'tRNA', 'rRNA', 'snRNA', 'scRNA', 'snoRNA', 'miscRNA', 'ncRNA’ | RelationshipXref |
PROTEIN | NCBI Gene | ‘protein-coding’ | RelationshipXref |
UniProt/SwissProt | n/a | UnificationXref | |
COMPLEX | - | n/a | n/a |
Background
Currently, there is a 'beta' testing instance of cPath2 accessible at https://beta.pathwaycommons.org/ which is loaded with Pathway Commons v12 data in addition to data exported from Biofactoid.
Issue
In using the web service to retrieve Biofactoid-sourced pathway data in various formats (BioPAX, SIF, TXT, SBGN), I have noticed that in some cases, the non-BioPAX formats return no data.
Notes and clues
a. It seems like this issue is exclusively a problem with Biofactoid pathways involving non-human gene/gene products
I looked through a few of the pathways that didn't involve human genes, and it seems like these universally show the same bug.
b. For non-human pathways, the participants (i.e. proteins) all seem to possess ProteinReferences that reference a RelationshipXref, but never a UnificationXref.