bridgedb / BridgeDb

The BridgeDb Library source code
https://bridgedb.org/
Apache License 2.0
27 stars 21 forks source link

Support Taxonomy for organisms? #34

Open ariutta opened 7 years ago

ariutta commented 7 years ago

Would anyone find it useful for BridgeDb to support the NCBI Taxonomy for identifying organisms? Some time ago, I mapped each of our supported organisms to its Taxonomy IRI, but I'm not sure where this belongs.

{
    "Anopheles gambiae": "http://identifiers.org/taxonomy/7165",
    "Arabidopsis thaliana": "http://identifiers.org/taxonomy/3702",
    "Aspergillus niger": "http://identifiers.org/taxonomy/5061",
    "Bacillus subtilis": "http://identifiers.org/taxonomy/1423",
    "Bos taurus": "http://identifiers.org/taxonomy/9913",
    "Caenorhabditis elegans": "http://identifiers.org/taxonomy/6239",
    "Canis familiaris": "http://identifiers.org/taxonomy/9615",
    "Ciona intestinalis": "http://identifiers.org/taxonomy/7719",
    "Danio rerio": "http://identifiers.org/taxonomy/7955",
    "Drosophila melanogaster": "http://identifiers.org/taxonomy/7227",
    "Escherichia coli": "http://identifiers.org/taxonomy/562",
    "Equus caballus": "http://identifiers.org/taxonomy/9796",
    "Gallus gallus": "http://identifiers.org/taxonomy/9031",
    "Gibberella zeae": "http://identifiers.org/taxonomy/5518",
    "Glycine max": "http://identifiers.org/taxonomy/3847",
    "Homo sapiens": "http://identifiers.org/taxonomy/9606",
    "Hordeum vulgare": "http://identifiers.org/taxonomy/4513",
    "Macaca mulatta": "http://identifiers.org/taxonomy/9544",
    "Mus musculus": "http://identifiers.org/taxonomy/10090",
    "Mycobacterium tuberculosis": "http://identifiers.org/taxonomy/1773",
    "Ornithorhynchus anatinus": "http://identifiers.org/taxonomy/9258",
    "Oryza indica": "http://identifiers.org/taxonomy/39946",
    "Oryza sativa": "http://identifiers.org/taxonomy/4530",
    "Oryza sativa Indica Group": "http://identifiers.org/taxonomy/39946",
    "Populus trichocarpa": "http://identifiers.org/taxonomy/3694",
    "Pan troglodytes": "http://identifiers.org/taxonomy/9598",
    "Rattus norvegicus": "http://identifiers.org/taxonomy/10116",
    "Saccharomyces cerevisiae": "http://identifiers.org/taxonomy/4932",
    "Solanum lycopersicum": "http://identifiers.org/taxonomy/4081",
    "Sus scrofa": "http://identifiers.org/taxonomy/9823",
    "Vitis vinifera": "http://identifiers.org/taxonomy/29760",
    "Xenopus tropicalis": "http://identifiers.org/taxonomy/8364",
    "Zea mays": "http://identifiers.org/taxonomy/4577"
  }
AlasdairGray commented 7 years ago

This is really entity resolution, i.e. going from free text to a IRI. Within the Open PHACTS context we used a different service for this. However, there is no reason that BridgeDb couldn't expand its functionality into this area. Similar data could be gathered for proteins in the UniProt data, gene names, etc.

ariutta commented 7 years ago

I'm open to moving it anywhere people would find it useful. I'm just removing it from the bridgedbjs code, because it's too complex for JavaScript.

ariutta commented 7 years ago

How about adding it as a new column to the TSV file organisms.txt?