Open katewarner opened 4 months ago
At the weekly meeting we decided that we will not us the EC numbers as names. Instead they will become part of the functional annotation section.
We need to solve one problem: We could make the text of the function "EC:[ec number] - [accepted name]". However links to CaZY or Expasy Enzyme can not be the evidence links because they did not provide this annotation ... UniProt did. So UniProt is still the evidence badge. Question for @katewarner, @rykahsay and @sujeetvkulkarni is how can we integrate CaZY or Expasy enzymes links?
We could have a separate EC number array in the JSON. Each entry has:
The frontend would know that these need to be added to the function part and format into "EC:[ec number] - [accepted name] (see Expasy Enzyme link or Cazy link)" and evidence bade from the evidence array.
This will require to change the JSON structure of the protein details.
@pkay47 --- why are some Brenda xrefs (EC numbers) not integrated into UniProt? For example, there is brenda xref connecting P22674 and "ec-3.2.2.27" as shown below
$ cat downloads/ebi/current/uniprot-proteome-homo-sapiens.nt | grep "3\.2\.2\.27" | grep P22674
<http://purl.uniprot.org/uniprot/P22674> <http://www.w3.org/2000/01/rdf-schema#seeAlso> <http://purl.uniprot.org/brenda/3.2.2.27> .
But, as shown below, there is no "http://purl.uniprot.org/core/enzyme" predicate connecting "P22674" and "ec-3.2.2.27"
$ cat downloads/ebi/current/uniprot-proteome-homo-sapiens.nt | grep "3\.2\.2\.27" | grep "<http://purl.uniprot.org/core/enzyme>"
<http://purl.uniprot.org/uniprot/P13051> <http://purl.uniprot.org/core/enzyme> <http://purl.uniprot.org/enzyme/3.2.2.27> .
<http://purl.uniprot.org/uniprot/A0A8V8TPS1> <http://purl.uniprot.org/core/enzyme> <http://purl.uniprot.org/enzyme/3.2.2.27> .
<http://purl.uniprot.org/uniprot/A0A8V8TQ66> <http://purl.uniprot.org/core/enzyme> <http://purl.uniprot.org/enzyme/3.2.2.27> .
<http://purl.uniprot.org/uniprot/A0A8V8TNE1> <http://purl.uniprot.org/core/enzyme> <http://purl.uniprot.org/enzyme/3.2.2.27> .
<http://purl.uniprot.org/uniprot/A0A8V8TNJ5> <http://purl.uniprot.org/core/enzyme> <http://purl.uniprot.org/enzyme/3.2.2.27> .
<http://purl.uniprot.org/uniprot/A0A8V8TNW2> <http://purl.uniprot.org/core/enzyme> <http://purl.uniprot.org/enzyme/3.2.2.27> .
<http://purl.uniprot.org/uniprot/F5GYA2> <http://purl.uniprot.org/core/enzyme> <http://purl.uniprot.org/enzyme/3.2.2.27> .
http://purl.uniprot.org/core/enzyme is present when EC is in protein names. UniProt help: https://www.uniprot.org/help/protein_names
xref identifiers could be anything, uniprot accession or xref_db_specific_id or EC number. It depends on xref database. Brenda xref_id contains EC number.
In P22674, there is brenda xref, but has no EC in its name. So no http://purl.uniprot.org/core/enzyme
It looks like all of the human entries are reviewed UniProtKB entries, which means a curator has looked at at the entries and doesn't think there is enough evidence to support the EC numbers, whereas Brenda (https://www.brenda-enzymes.org/advanced.php) is interested in enzyme families and EC numbers, and likely uses large scale analyses to map the EC numbers to proteins - This is why the UniProtKB entries don't have a EC in the entry but they have a Brenda EC xref.
So my suggestion, for enzymes in all organisms, would be to only display EC numbers in the Function section of GlyGen if they are in UniProtKB but keep the Brenda xrefs in the cross-references section of GlyGen, since future studies may determine that they are enzymes. But we can discuss this during the general meeting.
Since the downloaded nt files from EBI do not give connection between Rhea reaction IDs and EC numbers. This means for a given ec number "2.1.1.45" , I cannot create evidence URL=https://www.rhea-db.org/rhea/?query=ec:2.1.1.45 unless I know Rhea has a reaction ID mapping to "2.1.1.45".
In the feature, I want @pkay47 to add a predicate that connects Rhea/Reactome/... reaction IDs with EC-numbers.
For now, I am creating a new dataset file as follows:
Input: downloads/rhea/current/rhea-ec-iubmb.tsv Input_readme: downloads/rhea/current/README Output: reviewed/protein_reaction2ec_rhea.csv
With this, the protein detail APIs will have a new property called "enzyme_annotation" (example for P04818 is shown below)
"enzyme_annotation":[
{
"ec_number": "2.1.1.45",
"ec_name": "(6R)-5,10-methylene-5,6,7,8-tetrahydrofolate + dUMP = 7,8-dihydrofolate + dTMP.",
"evidence": [
{
"id": "2.1.1.45",
"database": "Rhea",
"url": "https://www.rhea-db.org/rhea/?query=ec:2.1.1.45"
},
{
"id": "2.1.1.45",
"database": "BRENDA Enzymes",
"url": "https://www.brenda-enzymes.org/enzyme.php?ecno=2.1.1.45"
}
]
}
]
The API has now the "enzyme_annotation" section
@rykahsay could you please create a ticket for 'add a predicate that connects Rhea/Reactome/... reaction IDs with EC-numbers' with predicate name & which databases required? Also please update datamodel - https://docs.google.com/document/d/1MOtPk2wTVb2EL-u1DD2T86J-JX4nQTKPRdScakG-X08/edit#
Added:
@pkay47 ... I don't see the triples
$ cat downloads/ebi/current/uniprot-proteome-homo-sapiens.nt | grep hasEnzyme
@rykahsay was planning the change to go in 2024_04 datasets, release is on 24-jul.
Do you want the current release to be updated? Only human or all datasets?
I found an issue with the EC numbers on the protein pages and I have some suggestions for changes to how the EC numbers are displayed on the front end that we could discuss during the general meeting: