Open DeniseSl22 opened 1 year ago
From the list of different Km values for the same reaction and the same gene I would expect that this means we are looking at splice variants that really are different proteins. Is that correct? In that case, you could map the different transcripts to the respective proteins. But the gene should either be mapped to all of them or (I think practically more useful), indeed only to the reaction and not to the individual proteins.
Could be, but this info on different transcripts might be unknown or not added to the databases we've investigated. We are however dealing with community curation results here, so some people might prefer to add the Ensembl ID vs the UniProt one. Since the UniProt ID is preferred to model proteins in the pathway models, I need to unify to one database regarding the RDF model. Therefore, there might be mappings needed from the Ensembl ID to a corresponding UniProt ID, which could lead to one-to-many-mappings. I will then prefer to use the approved UniProt ID; we cannot infer which transcript will be expressed most, since we don't have knowledge on the tissue the reactions are occurring in. Linking only to the interaction ID through Rhea is not enough.
I am not sure I agree. There are 3 aspects.
1) Creating unified RDF. In the current WikiPathways RDF approach we do two things. First, we include whatever ID the curator used. The idea is that at application time users/analysts can use a mappings service like BridgeDb from that according to their own needs and preferences. When we created the RDF we thought that that was the preferred way. But since we realized that might lead to confusion and some people might not have the necessary skills, we also added mappings to the most relevant ID types, indeed including ENSEMBL and UniProt. We did realize that our knowledge of biology and in fact biology itself does not always allow us to do that in a perfect way.
2) Transcripts and their translation products. Our knowledge about the different types of transcripts that occur because of for instance splice variants and especially our knowledge about post-translational modifications and activity-determining interactions is extremely limited while on the other hand, the amount of somehow identified products is enormous. For these reasons combined, we have not included transcripts in our mappings. or rather I think we removed them at some stage because they were huge in size and hardly useful. I could imagine that we have to reconsider that for work like you are doing (but see 3 first).
3) Of course you are right that we typically think of proteins when we think of reaction activity and by definition when protein involvement is proven, traditionally by repeating measurements after protein denaturation, we talk about enzyme activity. However, that does not mean that all reactions that we see occurring are fully enzymatic. We see both spontaneous activity and activity catalyzed by other biomolecules or surfaces. More importantly, it does not mean that we know which proteins are responsible for what part of an enzymatic activity and that can depend strongly on conditions like the presence of competing enzymes or the physicochemical conditions in different cell types.
For that reason, I think it is still best to describe a reaction with a reaction ID and map that to an Enzyme code first, if we know the reaction is mostly enzymatic. A mapping to proteins, including isoenzymes, splice variants, and post-translational modifications can then be done during analysis and needed not, and I would argue currently should not, be part of the RDF. The typical situation still is that we see so much activity under such experimental condition which can be explained to some extend by the presence of known proteins.
Dear Chris, thank you for your thoughts on this subject. I do however not wish to change my model last minute; I have given the current model proper thought and I believe it will work out well with the current knowledge captured in kinetic databases and publications.
I'll answer this in reverse order for clarity:
Example: