Open sidneymbell opened 6 years ago
A related comment from Ben:
Another idea - would be SUPER COOL if I could select two cell types that I know are adjecent to one another in the kidney, then click a button such that one cluster then shows all receptors it expresses, the other cluster all the ligands it expresses - so I could get at intercellular communication…that would be amazing…
In a practical sense, this could be very similar to the above idea for color-by-function (both involve filtering genes by function / GO classifications)
This was raised as a desired feature by the most recent feedback session w/ Ben Humphreys' lab, led by @neuromusic w/ @fionagriffin. I think it's worth revisiting.
The suggested GO annotations have a REST API, available here. I believe this would satisfy the requirements for an API that @colinmegill has been enthusiastic about finding / searching for?
One option for a tooltip as suggested by Colin could look like this:
another API discovered w/ @colinmegill this afternoon: mygene.info by @andrewsu's group
the API query would involve (at least) two steps:
querying mygene.info for each gene in the the tabula muris h5ad file resulted in...
example using the python wrapper for the API: https://gist.github.com/neuromusic/6ab7769c2030eec573b61b03a8021620
A few quick notes...
mg.query
, you can also perform batch queries via mg.querymany
, as described in https://pypi.org/project/mygene/fields
parameter (e.g., mg.querymany(['1500015L24Rik','1500016L03Rik','Zhx1', 'Zrsr2'],scopes='symbol',fields='entrezgene,summary,symbol')
Further questions and feedback are of course always welcome!
This is similar to what we're doing with Clustergrammer-JS's and Clustergrammer2's biology specific features. Mousing over a gene row looks up the gene name and refseq via the Harmonizome. Similarly, enrichment analysis is done via Enrichr. We have back-end (Python) and front-end (JavaScript) implementations Enrichr.
Let us know if that sounds like what you would like to implement and if we can help.
Hi @cornhundred -- thank you so much for the suggestion!
We'll have to look into whether their license is compatible with ours, but I super appreciate the pointer! It looks like a great resource (and Clustergrammer looks like a cool tool :).
Hi @sidneymbell, feel free to contact the Ma'ayan lab about their licenses (I'm pretty sure they're permissive), Harmonizome-license.
We're glad you like Clustergrammer! The Clustergrammer2 widget we are working on has a lot of similarities with cellxgene: we're using regl, Python back-end, built for single cell gene expression data. Feel free to check out the Clustergrammer2-notebooks repo: https://github.com/ismms-himc/clustergrammer2-notebooks for some example workflows (see video below):
We would love feedback and I'm sure we will reach out to you all about cross-tool compatibility, etc. in the future :)
👍👍
On Mon, Jul 22, 2019 at 5:12 PM Nicolas Fernandez notifications@github.com wrote:
Hi @sidneymbell https://github.com/sidneymbell, feel free to contact the Ma'ayan lab about their licenses (I'm pretty sure they're permissive), Harmonizome-license https://github.com/MaayanLab/harmonizome/blob/master/LICENSE.
We're glad you like Clustergrammer! The Clustergrammer2 widget we are working on has a lot of similarities with cellxgene: we're using regl, Python back-end, built for single cell gene expression data. Feel free to check out the Clustergrammer2-notebooks repo: https://github.com/ismms-himc/clustergrammer2-notebooks for some example workflows (see video below):
[image: 2,700 PBMC scRNA-seq] http://www.youtube.com/watch?v=BEPspcC7vIY
We would love feedback and I'm sure we will reach out to you all about cross-tool compatibility, etc. in the future :)
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/chanzuckerberg/cellxgene/issues/96?email_source=notifications&email_token=ADAIYX6GLHIVARO7T3Q2MMTQAXE57A5CNFSM4FGKCEH2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD2QHZGY#issuecomment-513834139, or mute the thread https://github.com/notifications/unsubscribe-auth/ADAIYX26E2BKCBJ6MN4IHBLQAXE57ANCNFSM4FGKCEHQ .
Another option that was suggested today by the GO folks: https://github.com/biolink/ontobio
I haven't looked into this extensively, but it's got a permissive license (BSD-3)
We want to make gene functions discoverable from within the app by pulling in data from public databases.
launch
will need to map input var_names
to gene identifiers; many of the APIs listed below take care of mapping between the various naming schemes, but it is still possible that a user would input a matrix with names like zebra
, in which case we should not try and fetch gene function data.Initial landscaping surfaced quite a few options for data sources. I’ve highlighted some of the most appealing options below with pros/cons, but there are probably also other good options out there. (See appendix for a list of options I don’t think are a good fit.)
NCBI gene database: entrez API URL: https://www.ncbi.nlm.nih.gov/gene About: https://www.ncbi.nlm.nih.gov/books/NBK25501/ License: https://www.ncbi.nlm.nih.gov/home/about/policies/ Pros: Direct access to a wide range of frequently-updated descriptive information of gene function in many species Cons: I haven’t yet found a set of JS-based wrapper functions, although the Python API is quite robust
Humanbase URL: https://hb.flatironinstitute.org/api/ About: https://hb.flatironinstitute.org/about License: CC-BY 4.0 (per direct communication, in process of adding to docs) Diligence in progress: compatible licensing and methods validation Pros: Surfaces interacting genes, functional processes, and tissue-specific expression I would imagine that support from the flatiron institute is pretty stable? Cons: License is not yet publicly documented on their site
Gene Ontology Consortium: AmiGO (GOlr) URL: http://wiki.geneontology.org/index.php/AmiGO_2_Manual:_JavaScript About: https://link.springer.com/protocol/10.1007/978-1-4939-3743-1_11 License: Creative Commons Attribution 4.0 Unported License Pros Direct access to the most up-to-date gene ontologies. Offers API for on-demand queries OR direct download of ontologies file that could be packaged into each release (~8MB; advantage is that this would not require an internet connection or sending information outside of the app). Cons Only pulls from the GO consortium / doesn’t offer any additional information directly API appears somewhat confusing
Mygene.info URL: https://mygene.info/ About: https://mygene.info/about License: Apache 2.0 Pros Weekly updated gene ontologies access API is RESTful and documentation is good Cons Only pulls from the GO consortium / doesn’t offer any additional information directly Unclear how stable the source is
Harmonizome URL: https://amp.pharm.mssm.edu/Harmonizome/gene/BRCA1 Pros: Nice visual display of most of the information present in the NCBI gene database + a few others Cons: Doesn’t offer a huge amount of additional information compared to NCBI, and adds another layer of dependency
GeneNetwork.nl URL: https://www.genenetwork.nl/faq All associations putatively based on co-regulation in bulk RNAseq
Other resources with a different use-case Reverse search (function → genes) https://amp.pharm.mssm.edu/geneshot/api.html Mendelian disease focus: https://www.omim.org/about Commercial: Gene Cards Gene set enrichment: webgestalt.org, geneweaver, DiVenn
re: mygene.info
Only pulls from the GO consortium / doesn’t offer any additional information directly
Doesn't this service aggregate a bunch of data sources? https://docs.mygene.info/en/latest/doc/data.html
Was this a mis-copy from the "Gene Ontology Consortium: AmiGO (GOlr)" entry above?
Just a bit more info on mygene.info in case it's useful:
Clearly I'm biased, but seems like you've got several good options for your use case here!
@neuromusic -- yes, that was a copy/paste error, thanks for catching :) @andrewsu -- thanks for sharing! Mygene.info sounds like an awesome tool. I think in this case, we can get the data we need directly from entrez without needing an additional dependency. I'll definitely keep mygene.info in mind if that changes, though!
@sidneymbell @neuromusic @ambrosejcarr it seems as if someone has done the thing:
http://amp.pharm.mssm.edu/Harmonizome/api/1.0/gene/apod
{"symbol":"APOD","synonyms":[],"name":"apolipoprotein D","description":"This gene encodes a component of high density lipoprotein that has no marked similarity to other apolipoprotein sequences. It has a high degree of homology to plasma retinol-binding protein and other members of the alpha 2 microglobulin protein superfamily of carrier proteins, also known as lipocalins. This glycoprotein is closely associated with the enzyme lecithin:cholesterol acyltransferase - an enzyme involved in lipoprotein metabolism. [provided by RefSeq, Aug 2008]","ncbiEntrezGeneId":347,"ncbiEntrezGeneUrl":"http://www.ncbi.nlm.nih.gov/gene/347","proteins":[{"symbol":"APOD_HUMAN","href":"/api/1.0/protein/APOD_HUMAN"}],"hgncRootFamilies":[{"name":"Calycin structural superfamily","href":"/api/1.0/gene_family/Calycin+structural+superfamily"},{"name":"Apolipoproteins (APO)","href":"/api/1.0/gene_family/Apolipoproteins+%28APO%29"}]}
@colinmegill @sidneymbell @neuromusic @ambrosejcarr Yes, when we were building the Harmonizome at the Ma'ayan lab we made sure to make it CORS compatible (https://clustergrammer.readthedocs.io/biology_specific_features.html#mouseover-gene-name-and-description).
We have this example on ObservableHQ (https://observablehq.com/@ismms-himc/covid-19-transcriptional-signature-tenoever-data-a549?collection=@ismms-himc/ismms-himc-covid-19) that shows you can talk to Enrichr (for enrichment analyssis) and Harmonizome via Clustergrammer-GL and some REST get requests.
I do apologize for not realizing this was JSON, in the thread above :)
The NCBI recommendation cited by @sidneymbell has a relatively simple set of web tools.
CD8A: https://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?db=gene&id=925 APOD: https://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?db=gene&id=347
Of note, search for the "representative expression" section at the bottom: it has recorded tissues in which expression of the gene has been established.
@Alokito mentioned that it would be a good idea for us to enable cellxgene to read from multiple cell databases. For companies, this will enable them to interface their own interface with their own internal metadata repositories. For us, it would facilitate easier swapping between feature namespaces (protein, DNA, transcripts, genes) and ensure cellxgene remains a general tool -- the requirement would be that the database index overlaps with the var
index in cellxgene. We could also enable the feature to read from .var
metadata as a default.
Hello, g:Profiler is another source for you to look at: https://biit.cs.ut.ee/gprofiler/gost
It supports all ensembl organisms and already has a python API: https://pypi.org/project/gprofiler-official/ https://biit.cs.ut.ee/gprofiler/page/apis
A good example of protein contextualization here (thanks Jonah and @ambrosejcarr): https://opencell.czbiohub.org/
Is there a way for user to browse var (gene metadata) in CellXGene (e.g. to decide which genes to plot latter on)?
Hi @Hrovatin, there is not a way to browse var in cellxgene. You can see if a gene exists in a dataset using the "Autosuggest gene" functionality in the top right corner which will autocomplete genes from the var index.
I think it's important to contextualize diffexp results by what these genes do, biologically. There are multiple "levels" at which we could support this:
1) Just link out to wikigene / human protein atlas / gene ontologies as Ben suggests below.
2) Group differentially expressed genes by ontology classification/tags, make this an option to color by (e.g., I find that cell set 1 is expressing high levels of a bunch of genes that are involved in "lipid metabolism". Color all cells by their mean expression of genes with this GO tag.)
3) Support GO enrichment analysis in-app
From my notes:
From Ben Humphrey's notes: