Closed murphyte closed 6 years ago
You can send us all the INSDC terms that you would like to map even if you see your term is already added as a synonym. We can specify that the synonym is from INSDC (ex. INSDC:-10_signal). We have done this for terms we have mapped that come from variant annotation tools. Your synonym would appear like this in the browser:
I couldn't find the term proprotein in the INSDC feature table spec. Is this term from a different controlled vocabulary?
@keilbeck Will it be a problem to map terms that are qualifiers and not feature types?
I couldn't find the term proprotein in the INSDC feature table spec. Is this term from a different controlled vocabulary?
The INSDC spec is specific for nucleotides. It includes some protein-related features, like signal peptides and mature peptides, but in the context of annotating a range on a nucleotide sequence overlapping a CDS feature. Beyond the INSDC spec and "GenBank" format, NCBI has the GenPept flatfile format for protein records, which includes some additional features that can be annotated on proteins. If they don't have a nucleotide equivalent, then they're converted to another feature type like "misc_feature" if projected from the protein onto a nucleotide record in GenBank flatfile format. Does that make sense?
Here's a protein record in GenPept format with a 'proprotein' example: https://www.ncbi.nlm.nih.gov/protein/NP_001165705.1
On top of that, there are a few additions to the INSDC specs that are coming soon (approved but not yet added to documentation). propeptide is one of those, meaning a 'proprotein' feature annotated on a protein sequence will be displayed as a 'propeptide' feature when projected onto a corresponding nucleotide sequence.
I'm not aware of public documentation with the full list of extra feature types supported in GenPept on top of what's in the INSDC feature spec. I could identify them in the full conversion table, and they could be added as "NCBI:" or "GenPept:" synonyms, instead of "INSDC:"
I should also say that this isn't a formally-endorsed INSDC:SO mapping table, and I'm thinking I should check into that before formally labeling these as "INSDC:" synonyms in the SO specs. I'll do that before providing the full table.
WRT the issue of mapping to feature type vs. qualifier, I'm looking at this as having two use cases:
For the first use case, I'd want to have just the regulatory_class qualifier value "INSDC:ribosome_binding_site" map to the SO term ribosome_entry_site (SO:0000139). For the 2nd use case, it's helpful to know both the feature and qualifier values, like "INSDC:regulatory-ribosome_binding_site". I'm not sure what the best way to express that in the SO records would be.
Best regards,
-Terence
Noting the existence of this ancient INSDC:SO correspondence table: http://sequenceontology.org/resources/mapping/FT_SO.html
There's a broken link to that page at: http://www.sequenceontology.org/resources/faq.html#map
I'm working on formalizing my mapping table with the INSD collaborators, but I think it's about done. Hopefully I'll be able to get the file to you next week, and we can discuss further how to fit it into your official synonym data model.
-Terence
I won't call this an official INSDC:SO mapping table, but I'm attaching the mapping that we're currently using for conversion between INSDC (or more specifically NCBI ASN.1) and SO terms. A few comments about the mapping:
Please take a look and see what you think of my proposed solution for adding INSDC triplets to synonyms, or if there's a better way. But it seems like getting the mapping into SO would save others the effort of repeating the exercise.
-Terence
I have added all of the INSDC mappings. Please let me know if there are any additional changes or mappings that need to be added.
Hi SO -- we're continuing to revise our mappings of INSDC to SO terms used in some code at NCBI, and I think it would make sense to add many of the INSDC terms as synonyms in SO. I've reviewed the list, and please consider adding the terms in the table below. For this list, I've included underscores because those are part of the INSDC term, but I did not list anything where the INSDC term is already present if you treat underscores and spaces as equivalent. That is, for the SO term "minus_10_signal", the INSDC term is "-10_signal" (with an underscore), which I didn't include here because "-10 signal" is already present. Let me know if you think there would be some value in including the INSDC synonym with the underscore, and I'd be happy to add those.
There might be some value in taking this one step further and adding a separate "INSDC term" field in SO rather than just having these mixed in as synonyms. If that's of interest, I can send along our full list. It's a little tricky because in some cases the mapping is directly to an INSDC feature type, whereas others are mapped to a "class" qualifier on a more general INSDC feature. For example, the SO term "minus_35_signal" corresponds to an INSDC "regulatory" feature with the qualifier /regulatory_class="minus_35_signal". This is true for the INSDC feature types "regulatory", "ncRNA", and "misc_recomb", which each have a "class" qualifier providing more specificity (often synonymous with SO). We can discuss it more if you think going this more formal route could be worthwhile.
For review, the INSDC feature table specs are at: http://www.insdc.org/files/feature_table.html
But for starters, here is my list of proposed synonyms to add:
Thanks!
-Terence Murphy