geneontology / annotation_extensions

Documentation, tickets & usage reports for annotation extension relations.
2 stars 2 forks source link

gene product/transcript/gene ID decision? #65

Open RLovering opened 8 years ago

RLovering commented 8 years ago

Hi All

I couldn't see a ticket for this one.

This issue was raised in Geneva December 2015. Every day we are using IDs in the AE field which potentially will have to be updated (for human data at least). Tony has identified that this will not be easily resolved simply by mapping IDs, due to problems with Ensembl IDs and UniProt ID correlations.

Please can people consider how this issue can be resolved. How can we encourage tools providers to use this data if the IDs in this field are going to change?

I have included the emails discussions I have below (but there may have been more)

Best

Ruth

Friday, 11 December 2015 09:04 Hi All

Following our discussion on Tuesday I think the following was agreed:

Using a protein ID when discussing regulation of transcription causes problems because it is a gene expression that is regulated not a protein. However, if we consider that a MOD gene ID, or a UniProtKB protein ID or an RNA central ID can be used simply to represent the ‘gene’/‘gene product’ etc depending on what type of object is being regulated according to the GO term used in the annotation, then these Ids should be used in both the WITH field as well as the annotation extension field.

Therefore, the proposal is that MOD gene IDs, or UniProtKB protein IDs or an RNA central Ids will be used to represent the gene/gene product objects the GO term relates to.

Not sure I have written this very well.

However, David and Kimberley have agreed to circulate an email about this in the very near future to give GO consortium members an opportunity to comment on this decision before it is finalised.

It was also recognised that there is an urgency to this decision as Rachael’s microRNA guideline paper has been submitted to RNA and that if accepted the published version needs to include the appropriate ID information.

All the best

Ruth

Friday, 11 December 2015 10:18

Agreed. As long as we avoid using protein IDs to stand in for genes, then I'm happy with the solution. The alternative makes defining relations and their ranges hard and is likely to cause consistency problems for OWL versions of our data.

Cheers, David

[Note David hadn't appreciated that the suggestion was to use protein IDs to stand in for genes - probably due to bad writing by me]

@cmungall @dosumis @rachhuntley @mcourtot @rebeccafoulger @tonysawfordebi @ukemi @thomaspd

vanaukenk commented 8 years ago

@RLovering I am putting this item on the agenda for Tuesday's annotation call. This was discussed on the 2015-12-16 managers call where we agreed that the proposal would be to use MOD gene IDs, UniProtKB accessions, RNACentral IDs, IntAct complex IDs (and also PRO IDs?) to indicate genes/gene products/complexes.
Please see: http://wiki.geneontology.org/index.php/Manager_Call_2015-12-16 @dosumis - I believe this would mean that curators can use UniProtKB IDs to indicate a gene in Col. 16 annotation extensions, for example, but this is consistent with how we use identifiers in Col. 2. Can you comment further on this wrt concerns about defining ranges and OWL representation?

RLovering commented 8 years ago

it has also occurred to me that we need to consider how LEGO is creating (or will create) the equivalent C16 annotations. I have a feeling that regulation of transcription of a human gene would link to regulation of a UniProt protein IDs

dosumis commented 8 years ago

it has also occurred to me that we need to consider how LEGO is creating (or will create) the equivalent C16 annotations. I have a feeling that regulation of transcription of a human gene would link to regulation of a UniProt protein IDs

Patterns definitely need to be co-ordinated. LEGO shouldn't need the shortcut relations we've created for annotation extensions, but still needs a convention (pattern) for how to record regulation of gene expression. Ideally that pattern would be consistent with what we've done for AE relations. If they are free to choose regulates translation -> protein and regulates expression -> gene then LEGO annotations won't group properly.