The-Sequence-Ontology / SO-Ontologies

Collect of SO Ontologies
Creative Commons Attribution 4.0 International
96 stars 37 forks source link

Delete children of 'C_D_box_snoRNA' and 'H_ACA_box_snoRNA' #576

Open sjm41 opened 2 years ago

sjm41 commented 2 years ago

Following discussions amongst RNAcentral members (and specifically with Michelle Scott and Ruth Seal), we'd like to propose deleting the children of 'C_D_box_snoRNA' and 'H_ACA_box_snoRNA' (in the ncRNA, ncRNA_gene and primary_transcript branches of the SO) - their IDs could be added as secondary IDs to the current parent terms.

The reason is that these snoRNA subtypes correspond to either functional terms or discrete Rfam families, which both seem to be outside the scope of SO and SO annotation. Details below.

snoRNA terms defined based on function:

snoRNA terms based on specific Rfam families:

egchristensen commented 2 years ago

@keilbeck There are a few PMIDs attached to these terms, but they're all over 20 years old. I see you're listed as a reference for methylation_guide_snoRNA_primary_transcript (SO:0000580). Were these branches added for a particular group or use case? If so, I just want to do some digging before obsoleting anything.

keilbeck commented 2 years ago

I believe RFAM asked for these terms. But it was a long time ago. If RFAM does not think it appropriate then we should go with the experts. Maybe we can find out if there is any usage of these terms and let people know the better way to annotate?

sjm41 commented 2 years ago

@blakesweeney Does Rfam use/need these snoRNA SO terms:

U3_snoRNA (SO:0001179) U3_snoRNA_gene (SO:0002378) U14_snoRNA (SO:0000403) U14_snoRNA_primary_transcript (SO:0005837) U14_snoRNA_gene (SO:0002377)

Proposal in original post is to obsolete them as 'out of scope' of SO (similar reasoning to #546)

blakesweeney commented 2 years ago

While Rfam uses the U3, U14 _snoRNA terms, I'd be fine with removing then for similar reasons as #546. We can switch to the correct snoRNA terms instead. I'd also like to say the terms mentioned in #546 would also be good candidates to remove.

sjm41 commented 2 years ago

Maybe we can find out if there is any usage of these terms and let people know the better way to annotate?

I don't know of the best way of doing this, but the only relevant hits from a google search for each of the 10 SO IDs listed above are for sequenceontology.org or the SO GitHub page. So seems that none of these 10 terms are used much/if all. I think they can be safely obsoleted. Their IDs could be added as secondary IDs to their parent terms? Could also add a note to the obsoleted term explaining the reason for obsoletion, and pointing to the relevant Rfam or GO term.

egchristensen commented 2 years ago

I think we can go ahead and obsolete these terms, but leave a comment redirecting people to the proper annotation source. @sjm41, could I ask you to identify the right RFam/GO IDs or links to include as I obsolete these? The terms will remain in SO, but they will be disconnected from their parents and marked as obsolete with a reference to where they should go from now on.

sjm41 commented 1 year ago

methylation_guide_snoRNA (SO:0005841) = GO:0030561 methylation_guide_snoRNA_primary_transcript (SO:0000580) = GO:0030561 methylation_guide_snoRNA_gene (SO:0002379) = GO:0030561

pseudouridylation_guide_snoRNA (SO:0001187) = GO:0030558 pseudouridylation_guide_snoRNA_gene (SO:0002380) = GO:0030558

U3_snoRNA (SO:0001179) = RF00012 U3_snoRNA_gene (SO:0002378) = n/a or RF00012

U14_snoRNA (SO:0000403) = RF00016 U14_snoRNA_primary_transcript (SO:0005837) = n/a or RF00016 U14_snoRNA_gene (SO:0002377) = n/a or RF00016