glasgowcompbio / pyMultiOmics

Python toolbox for multi-omics data mapping and analysis
MIT License
19 stars 4 forks source link

Retrieving additional information for an entity #15

Closed joewandy closed 3 years ago

joewandy commented 3 years ago

As part of issue #11, I added some codes to retrieve the extra information of a node. These are the same codes transplanted (with slight modification) from GraphOmics that shows the additional information in the bottom panel.

@kmcluskey let me know if you need other info which are not retrieved here, and we can try to add it @RonanDaly I tagged you anyway in case you want to be involved with developing this package 😁

Two ways of using it:

  1. Directly calling the get_info method -- works best if you have the list of ids to query. For each id, a dictionary having three keys is produced: infos, links, and images.

infos is a list of key & value, e.g. 'key': 'EC Number', 'value': 'EC6.2.1.17 links is a list of additional links to outside resources images is a list of image links if available

Example code below retrievs the additional information of the entities in entity_ids:

from pyMultiOmics.info import get_info

entity_ids = ['ENSDARG00000091254', 'F1QAA7', '15378', 'R-DRE-469659', 'R-DRE-174403']
data_types = ['genes', 'proteins', 'compounds', 'reactions', 'pathways']
for entity_id, data_type in zip(entity_ids, data_types):
    print(entity_id, data_type)
    print(get_info(entity_id, data_type))
    print()

Output:

ENSDARG00000091254 genes
{'infos': [{'key': 'Description', 'value': 'si:ch73-59p9.2 '}, {'key': 'Species', 'value': 'danio_rerio'}], 'links': [{'text': 'Link to Ensembl', 'href': 'https://www.ensembl.org/id/ENSDARG00000091254'}, {'text': 'Link to GeneCard', 'href': 'https://www.genecards.org/cgi-bin/carddisp.pl?gene=si:ch73-59p9.2'}, {'text': 'Transcript: si:ch73-59p9.2-201', 'href': 'https://www.ensembl.org/id/ENSDART00000111526'}], 'images': []}

F1QAA7 proteins
{'infos': [{'key': 'Name', 'value': 'Propionate--CoA ligase'}, {'key': 'EC Number', 'value': 'EC6.2.1.17'}, {'key': 'Catalytic Activity', 'value': '\n\nATP + CoA + propanoate = AMP + diphosphate + propanoyl-CoA\n\n\n\n\n\n\n\n\n\n\n\n\n'}, {'key': 'Catalytic Activity', 'value': '\n\nacetate + ATP + CoA = acetyl-CoA + AMP + diphosphate\n\n\n\n\n\n\n\n\n\n\n\n\n'}, {'key': 'Gene_ontologies', 'value': 'acetate-CoA ligase activity; propionate-CoA ligase activity'}], 'links': [{'text': 'Link to UniProt', 'href': 'http://www.uniprot.org/uniprot/F1QAA7'}, {'text': 'Link to SWISS-MODEL', 'href': 'https://swissmodel.expasy.org/repository/uniprot/F1QAA7'}], 'images': []}

15378 compounds
{'infos': [{'key': 'PiMP Peak ID', 'value': 'None'}, {'key': 'KEGG ID', 'value': C00080}, {'key': 'FORMULA', 'value': H}, {'key': 'ChEBI ID', 'value': CHEBI:15378}, {'key': 'Definition', 'value': The general name for the hydrogen nucleus, to be used without regard to the hydrogen nuclear mass (either for hydrogen in its natural abundance or where it is not desired to distinguish between the isotopes).}, {'key': 'Monoisotopic Mass', 'value': 1.008}, {'key': 'SMILES', 'value': [H+]}, {'key': 'Inchi', 'value': InChI=1S/p+1}, {'key': 'InchiKey', 'value': ''}], 'images': ['http://www.ebi.ac.uk/chebi/displayImage.do?defaultImage=true&imageIndex=0&chebiId=15378'], 'links': [{'text': 'Link to ChEBI database', 'href': 'https://www.ebi.ac.uk/chebi/searchId.do?chebiId=CHEBI:15378'}, {'text': 'Link to KEGG COMPOUND database', 'href': 'http://www.genome.jp/dbget-bin/www_bget?cpd:C00080'}]}

R-DRE-469659 reactions
{'infos': [{'key': 'Summary', 'value': 'The conversion of testosterone to the most potent androgen, 5-alpha-dihydrotestosterone (DHT), is catalyzed by the microsomal 5alpha-steroid reductase enzymes, of which there are three reported types in humans to date (SRD5A1-3) (Andersson S and Russell DW, 1990, Andersson S et al, 1991, Uemura M et al, 2008 respectively). These enzymes are expressed in the prostate and other androgen target sites. Defects in SRD5A2 are the cause of pseudovaginal perineoscrotal hypospadias, also known as male pseudohermaphroditism (Anwar R et al, 1997). Corticotropin (Adrenocorticotropic hormone, ACTH) acts through the ACTH receptor called melanocortin receptor type 2 (MC2R) to stimulate steroidogenesis, increasing the production of androgens (McKenna et al, 1997).'}, {'key': 'Species', 'value': 'Danio rerio'}, {'key': 'Inferred', 'value': 'Inferred from Homo sapiens'}, {'key': 'Catalystactivity', 'value': '3-oxo-5-alpha-steroid 4-dehydrogenase activity of SRD5A1-3 [endoplasmic reticulum membrane]', 'url': ''}, {'key': 'Input', 'value': 'TEST [cytosol];NADPH [cytosol];H+ [cytosol]', 'url': 'https://reactome.org/content/detail/R-ALL-193057;https://reactome.org/content/detail/R-ALL-29364;https://reactome.org/content/detail/R-ALL-70106'}, {'key': 'Output', 'value': 'NADP+ [cytosol];DHTEST [cytosol]', 'url': 'https://reactome.org/content/detail/R-ALL-29366;https://reactome.org/content/detail/R-ALL-469662'}, {'key': 'Regulatedby', 'value': "Positive regulation by 'POMC(138-176) [cytosol]'", 'url': ''}], 'images': ['https://reactome.org/ContentService/exporter/diagram/R-DRE-469659.jpg?sel=R-DRE-469659&quality=7'], 'links': [{'text': 'Link to Reactome database', 'href': 'https://reactome.org/content/detail/R-DRE-469659'}]}

R-DRE-174403 pathways
{'infos': [{'key': 'Summary', 'value': 'The combination of glutamate, cysteine and ATP is required to form glutathione. The steps involved in the synthesis and recycling of glutathione are outlined (Meister, 1988).'}, {'key': 'Species', 'value': 'Danio rerio'}, {'key': 'Inferred', 'value': 'Inferred from Homo sapiens'}, {'key': 'Reactions', 'value': 'CHAC1,2 cleaves GSH to OPRO and CysGly;CNDP2:2Mn2+ dimer hydrolyses CysGly;GCL ligates L-Glu to L-Cys;GGCT transforms gGluCys to OPRO;GGT dimers hydrolyse GSH;GSS:Mg2+ dimer synthesizes GSH;OPLAH hydrolyses OPRO to L-Glu', 'url': 'https://reactome.org/content/detail/R-DRE-1247910;https://reactome.org/content/detail/R-DRE-1247922;https://reactome.org/content/detail/R-DRE-1247935;https://reactome.org/content/detail/R-DRE-174367;https://reactome.org/content/detail/R-DRE-174394;https://reactome.org/content/detail/R-DRE-6785928;https://reactome.org/content/detail/R-DRE-8943279'}], 'images': ['https://reactome.org/ContentService/exporter/diagram/R-DRE-174403.jpg?sel=R-DRE-174403'], 'links': [{'text': 'Link to Reactome database', 'href': 'https://reactome.org/content/detail/R-DRE-174403'}, {'text': 'SBML Export', 'href': 'https://reactome.org/ContentService/exporter/sbml/R-DRE-174403.xml'}]}
  1. A new Info query which can be used as part of the QueryBuilder -- useful when chaining with other queries. The resulting information as above are now added as extra columns in the output dataframe.

Example code below retrieves the additional information of everything connected to the DE genes in the data:

res = QueryBuilder(ap) \
        .add(Select(GENES)) \
        .add(SignificantDE(case, control, pval, fc_lte=fc_lte, fc_gte=fc_gte, N=N)) \
        .add(Connected()) \
        .add(Info()) \
        .run()
res

Output: image