Open AnneCarpenter opened 8 months ago
I edited the titel to correct conclusions with updated data.
Waiting for chromosome-arm-corrected CRISPR data from @zahrahanifehlou before proceeding (probably it will not change the results but we don't want to waste time in case it does!)
(we are also waiting for tools from @tjetkaARD to make heatmaps that include KG+ and KG- genes to provide more context when we are zooming into a cluster with some KG- connections in it.)
The heatmap shows the percentile of the cosine similarities (1 → similar, 0 → anti-similar). The text is the maximum of the absolute KG score (gene_mf__go
, gene_bp_go
, gene_pathway
). I set a KG threshold (like we previously had) of 0.4. If connections have a score lesser than this threshold, then the connection is considered to be unknown. The KG scores were downloaded from Google Drive: ORF and CRISPR. The diagonal of the heatmap indicates whether a gene has a phenotype (False
could also mean the gene is not present in the dataset).
These genes are similar in both ORF and CRISPR. The SARS2-UQCRFS1 connection seems to be known. Others connections are not known.
In my opinion, it is a quite interesting subset.
Agree, let's pursue this - let's be sure that we re-create the clusters based on what are the nearest neighbors of the genes involved rather than including genes just because they were in the original clusters with old profiles.
Alan has gathered some info about the genes here: https://docs.google.com/document/d/1zKkDpBWbb3NnQhlX34LEWuuZxy5Rotre5uuuBMTxvxY/edit#heading=h.92upd2fec7b1
The 3 core genes mentioned in the issue title (ECH1, UQCRFS1, SARS2,) are tightly linked in ORF and CRISPR; the other genes shown for ORf and CRISPR are non-overlapping so we will probably drop them.
The 3 core genes are relatively known to be involved with each other (KG scores .446, .638) except ECH1 and SARS2 less so with KG .343 so that would be the one potential 'discovery' here.
This site shows the two genes are the top nearest neighbours of each other based on cell line RNA expression: https://www.proteinatlas.org/ENSG00000104823-ECH1/cell+line
It's possible some quick searches might reveal other links from other data sources.
The 3 genes are mitochondrial-related so this relationship seems supported by extra info and could be presented as a brief story, just needs synthesis/writing up.
Plex analysis of the 52 genes most similar to this cluster identifies multiple connections to mitochondrial function. (Plex search). Note that this link is to a search of human+mouse+rat orthologs, so the displayed number of searched items is >52.
Of the 52 genes most similar to this cluster in our ORF dataset, the majority are mitochondria-associated (GOCC_Mitochondrion), including 16 mitochondrial disease-associated genes (Ochoa et al. 2023). Knockdown or overexpression of LINC00473, a regulator of lipolysis and mitochondrial respiration, was shown to downregulate 15 of these genes (Tran et al. 2020), and knockdown of the mitochondrial chaperone PHB2 resulted in down-regulation of 14 of this 52 gene cluster (Liu et al 2017). Additionally, in proteomic profiling data from cells treated with a library of 875 compounds, 5 of the 10 profiles with the greatest overlap with this 52 gene cluster were from inhibitors of the PI3K/MTOR pathway (Mitchell et al. 2023), which regulates both mitochondrial function and biogenesis (Morita et al. 2017). @AnneCarpenter
Great, here is the draft text put into the main paper accordingly. @niranjchandrasekaran ready for you to finalize figures, etc.
ECH1, UQCRFS1, SARS2 cluster and are implicated in mitochondrial function and cancer We found that the three enzymes strongly correlated in both the ORF and CRISPR profiles: ECH1 (enoyl-CoA hydratase 1), UQCRFS1 (ubiquinol-cytochrome c reductase, Rieske iron-sulfur polypeptide 1), and SARS2 (seryl-tRNA synthetase 2, mitochondrial) (Fig Xx). The connections among them are not well-known; UQCRFS1 and SARS2 are connected in the knowledge graph via databases, not by literature reports, and the remaining connections show weak or low knowledge graph scores (Fig Xx). Some existing data supports these new connections: SARS2 is the most highly correlated gene with ECH1 in terms of cell line RNA expression; SARS2 and ECH1 are the 5th and 10th top matches respectively for UQCRFS1 (https://www.proteinatlas.org/ENSG00000104823-ECH1/cell+line). Analyzing the top 52 genes most similar to this cluster in the ORF dataset using the Plex web application, the majority are mitochondria-associated (GOCC_Mitochondrion), including 16 mitochondrial disease-associated genes (Ochoa et al. 2023). Knockdown or overexpression of LINC00473, a regulator of lipolysis and mitochondrial respiration, downregulates 15 of these genes (Tran et al. 2020), and knockdown of the mitochondrial chaperone PHB2 downregulates 14 genes in the cluster (Liu et al 2017). Additionally, in proteomic profiling data from cells treated with a library of 875 compounds, 5 of the 10 profiles with the greatest overlap with this 52 gene cluster were from inhibitors of the PI3K/MTOR pathway (Mitchell et al. 2023), which regulates both mitochondrial function and biogenesis (Morita et al. 2017). Given recent interest in UQCRFS1 as a mitochondrial-related oncology biomarker and drug target (Sun et al. 2023), SARS2 and ECH1 merit attention as well.
This cluster was found in #7 as strong (+/-) correlation in both ORF and CRISPR but not (completely) strongly connected in the KG.
Looking across ORF and CRISPR plots there, there are a few other adjacent genes we should consider adding to this and then figure out a story for them (if needed, contacting a biologist who studies some subset of these genes).
For CRISPR, UQCRFS1 is p-value replicable. Otherwise, all genes are q-value replicable (for ORF & CRISPR).
Edited by Tomasz Jetka: updated title and image according to update in https://github.com/broadinstitute/2023_12_JUMP_data_only_vignettes/issues/7#issuecomment-1901252123