SBRG / bigg_models

The BiGG Models website server
http://bigg.ucsd.edu
Other
77 stars 18 forks source link

Retrieve database links for all gene IDs in a BiGG model for ID mapping #325

Open janstrauss1 opened 5 years ago

janstrauss1 commented 5 years ago

Hi there,

I apologize in advance for my newbie question but I am currently stuck in some ID mapping issues.

I am trying to retrieve the external database IDs for all genes in a BiGG model in order to map BiGG gene IDs to Uniprot IDs for downstream Gene Set Enrichment Analysis using reporter metabolites.

I have tried the BiGG Web API using curl 'http://bigg.ucsd.edu/api/v2/models/iML1515/genes' but this does not return database_links for the genes. External database links and IDs only seem to be returned when specifying a gene, metabolite or reaction like curl 'http://bigg.ucsd.edu/api/v2/models/iML1515/genes/b0002'.

How can I retrieve external database ID mappings for all gene IDs in a BiGG model?

Many thanks in advance for your help!

Jan

draeger commented 5 years ago

In case working with the SBML version of the model is an option, you could alternatively download the model from http://bigg.ucsd.edu/static/models/iML1515.xml.gz, uncompress, and go through the listOfGeneProducts. There you can find links to external databases, which all start with the prefix http://identifiers.org/ followed by a catalog identifier (e.g., ncbigene), and an actual identifier (separated by a slash /), e.g.,

    <fbc:listOfGeneProducts xmlns:fbc="http://www.sbml.org/sbml/level3/version1/fbc/version2">
      <fbc:geneProduct fbc:id="G_b2551" fbc:label="b2551" metaid="G_b2551">
        <annotation>
          <rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:bqbiol="http://biomodels.net/biology-qualifiers/">
            <rdf:Description rdf:about="#G_b2551">
              <bqbiol:is>
                <rdf:Bag>
                  <rdf:li rdf:resource="http://identifiers.org/uniprot/P0A825" />
                </rdf:Bag>
              </bqbiol:is>
              <bqbiol:isEncodedBy>
                <rdf:Bag>
                  <rdf:li rdf:resource="http://identifiers.org/asap/ABE-0008389" />
                  <rdf:li rdf:resource="http://identifiers.org/ecogene/EG10408" />
                  <rdf:li rdf:resource="http://identifiers.org/ncbigene/947022" />
                  <rdf:li rdf:resource="http://identifiers.org/ncbigi/gi:16130476" />
                </rdf:Bag>
              </bqbiol:isEncodedBy>
            </rdf:Description>
          </rdf:RDF>
        </annotation>
      </fbc:geneProduct>
      <!-- ... -->

Please note that the qualifiers (in this example bqbiol:is and bqbiol:isEncodedBy) give you information about the relationship between the particular geneProduct and the external references.

zakandrewking commented 5 years ago

Thanks for making an issue for this. These identifiers are in SBML, but we should also include them in JSON model downloads and on the Data Access page (e.g. with a genes flat file), so I'm going to leave this open for those new features.