Orthology API - example & question

lpalbou commented 6 years ago

Basic example (provided by Anushya):

http://panthertest2.usc.edu/services/ortholog/matchOrtho.jsp?format=json&organism=9606&targetOrganism=10090,9913,83333&geneInputList=brca1&orthologType=all

It reads as: "given the gene brca1 from species 9606 (human), gives me the orthologues (if any) in species 10090 (mouse), 9913 (bovin) and 8333 (E.Coli)"

However, the result of that query gives:

{"search": {
    "product": {
        "version": 13.1,
        "content": "PANTHERDB"
    },
    "mapping": {"mapped": [
        {
            "target_gene_symbol": "Brca1",
            "ortholog": "LDO",
            "gene": "HUMAN|HGNC=1100|UniProtKB=P38398",
            "target_gene": "MOUSE|MGI=MGI=104537|UniProtKB=P48754",
            "id": "brca1"
        },
        {
            "target_gene_symbol": "BRCA1",
            "ortholog": "LDO",
            "gene": "HUMAN|HGNC=1100|UniProtKB=P38398",
            "target_gene": "BOVIN|Gene=BRCA1|UniProtKB=F1MYX8",
            "id": "brca1"
        },
        {
            "target_gene_symbol": "BRCA1",
            "ortholog": "LDO",
            "gene": "HUMAN|HGNC=1100|UniProtKB=P38398",
            "target_gene": "BOVIN|Gene=BRCA1|UniProtKB=Q864U1",
            "id": "brca1"
        }
    ]},
    "search_type": "matching ortholog info"
}}

The second & third results are both from species BOVIN (9913), with a same gene but different uniprotIDs. When looking at protein sequences, they are nearly identical, with one residue (around 1251) that differs. They share a same UniGene and GeneID but different RefSeqs. F1MYX8 has a predicted sequence (https://www.ncbi.nlm.nih.gov/nuccore/XM_010816216.2) whereas Q864U1 has experimental evidence (https://www.ncbi.nlm.nih.gov/nuccore/NM_178573.1) and is shorter.

So I suppose the question is: why do we have the result F1MYX8 that is the outdated version of Q864U1, and how can we correct that ?

Also, could we label "id" as "name" or "gene_name" and id as a URI of the gene ? it would help interoperability with other tools (especially for data commons)

cmungall commented 6 years ago

yes, all IDs should be CURIEs with registered prefixes, and let's hide the panther specific = separators

thomaspd commented 6 years ago

I agree we can change the "=" to ":" in the output. Anushya, can you please do that? Chris, do we also need to solve the "MGI:MGI:" issue and convert those to "MGI:"?

Laurent, regarding the two cow orthologs, PANTHER uses the reference proteome sets generated by UniProt, so this must be an issue with UniProt. This may have been fixed in the year since we downloaded these from UniProt. You can check to see if they are in the reference proteome set still by looking in the Miscellaneous (last) section of the UniProt record, for the reference proteome keyword.

About "id", maybe we should redesign the output fields to separate species (e.g. HUMAN, so we could use NCBI Tax ID's, gene id (e.g. HGNC), and protein id (UniProtKB), rather than using the PANTHER concatenated name in "target_gene"

geneontology / go-api

Orthology API - example & question #8