belbio / bel_resources

BEL resources and tools for namespaces, annotations, orthologs, taxonomy, etc and tool
https://bel-resources.rtfd.io
Apache License 2.0
2 stars 0 forks source link

remap SwissProt to terminology schema #99

Open ncatlett opened 6 years ago

ncatlett commented 6 years ago

It would be useful for the primary accession to be used as the src_id ("canonical id from source database - used in the url_template") to enable conversions from BEL networks to SwissProt accessions as well as use of the url template.

I'd suggest mapping as follows (but feel most strongly about the src_id, label, and expansion of the synonyms to the protein names): src_id - use the primary accession (1st accession in the list associated with an entry) namespace_value - use the entry name/mnemonic identifier (e.g., IL7_HUMAN) id - e.g., SP:IL7_HUMAN label - use the entry name/mnemonic identifier (e.g., IL7_HUMAN) name - use the protein recommended name (e.g., Interleukin-7) description - omit, or reuse protein name for completeness synonyms - should include protein names (recommended names short/full, alternative names short/full) and potentially gene names (synonyms)

Entry information documentation - https://www.uniprot.org/help/entry_information_section Protein names documentation - https://www.uniprot.org/help/protein_names

epichler commented 5 years ago

I agree with @ncatlett.

In the case of UniProt, the src_id should equal the first (primary) UniProt accession number. As to the other attributes, there are several options - which ones to choose will depend on the business rules for processing UniProt attributes.

Consulting UniProt documentation

and using the IL7_HUMAN example mentioned above one could find the following attribute equivalences: image (The UniProt attribute descriptions refer to the entries for P13232/IL7_HUMAN as they are listed in the text view format at