Open ncatlett opened 6 years ago
I agree with @ncatlett.
In the case of UniProt, the src_id should equal the first (primary) UniProt accession number. As to the other attributes, there are several options - which ones to choose will depend on the business rules for processing UniProt attributes.
Consulting UniProt documentation
and using the IL7_HUMAN example mentioned above one could find the following attribute equivalences: (The UniProt attribute descriptions refer to the entries for P13232/IL7_HUMAN as they are listed in the text view format at
It would be useful for the primary accession to be used as the src_id ("canonical id from source database - used in the url_template") to enable conversions from BEL networks to SwissProt accessions as well as use of the url template.
I'd suggest mapping as follows (but feel most strongly about the src_id, label, and expansion of the synonyms to the protein names): src_id - use the primary accession (1st accession in the list associated with an entry) namespace_value - use the entry name/mnemonic identifier (e.g., IL7_HUMAN) id - e.g., SP:IL7_HUMAN label - use the entry name/mnemonic identifier (e.g., IL7_HUMAN) name - use the protein recommended name (e.g., Interleukin-7) description - omit, or reuse protein name for completeness synonyms - should include protein names (recommended names short/full, alternative names short/full) and potentially gene names (synonyms)
Entry information documentation - https://www.uniprot.org/help/entry_information_section Protein names documentation - https://www.uniprot.org/help/protein_names