broadinstitute / 2023_12_JUMP_data_only_vignettes

Collection of JUMP documentation and projects for internal and public consumption
1 stars 0 forks source link

Cluster ECH1, UQCRFS1, SARS2: exploration for MorphMap paper (ORF+CRISPR) #16

Open AnneCarpenter opened 8 months ago

AnneCarpenter commented 8 months ago

This cluster was found in #7 as strong (+/-) correlation in both ORF and CRISPR but not (completely) strongly connected in the KG.

Looking across ORF and CRISPR plots there, there are a few other adjacent genes we should consider adding to this and then figure out a story for them (if needed, contacting a biologist who studies some subset of these genes).

image

For CRISPR, UQCRFS1 is p-value replicable. Otherwise, all genes are q-value replicable (for ORF & CRISPR).

Edited by Tomasz Jetka: updated title and image according to update in https://github.com/broadinstitute/2023_12_JUMP_data_only_vignettes/issues/7#issuecomment-1901252123

tjetkaARD commented 8 months ago

I edited the titel to correct conclusions with updated data.

AnneCarpenter commented 7 months ago

Waiting for chromosome-arm-corrected CRISPR data from @zahrahanifehlou before proceeding (probably it will not change the results but we don't want to waste time in case it does!)

(we are also waiting for tools from @tjetkaARD to make heatmaps that include KG+ and KG- genes to provide more context when we are zooming into a cluster with some KG- connections in it.)

niranjchandrasekaran commented 3 months ago

Notebook

The heatmap shows the percentile of the cosine similarities (1 → similar, 0 → anti-similar). The text is the maximum of the absolute KG score (gene_mf__go, gene_bp_go, gene_pathway). I set a KG threshold (like we previously had) of 0.4. If connections have a score lesser than this threshold, then the connection is considered to be unknown. The KG scores were downloaded from Google Drive: ORF and CRISPR. The diagonal of the heatmap indicates whether a gene has a phenotype (False could also mean the gene is not present in the dataset).

These genes are similar in both ORF and CRISPR. The SARS2-UQCRFS1 connection seems to be known. Others connections are not known.

ORF

ORF-connections-ECH1-SARS2-UQCRFS1

CRISPR

CRISPR-connections-ECH1-SARS2-UQCRFS1

tjetkaARD commented 3 months ago

In my opinion, it is a quite interesting subset.

AnneCarpenter commented 3 months ago

Agree, let's pursue this - let's be sure that we re-create the clusters based on what are the nearest neighbors of the genes involved rather than including genes just because they were in the original clusters with old profiles.

niranjchandrasekaran commented 1 month ago

Notebook

Here are the recreated clusters

ORF

ORF-connections-DGUOK-ECH1-LDHAL6B-MRPS2-SARS2-UQCRFS1

CRISPR

CRISPR-connections-ECH1-LAIR1-PVR-SARS2-SLC1A5-UQCRFS1

niranjchandrasekaran commented 1 month ago

Notebook

This connection is not affected by plate layout

ORF

ORF-plate-layout-MRPS2-SARS2-LDHAL6B-ECH1-DGUOK-UQCRFS1

CRISPR

CRISPR-plate-layout-PVR-UQCRFS1-SARS2-ECH1-LAIR1-SLC1A5

AnneCarpenter commented 1 month ago

Alan has gathered some info about the genes here: https://docs.google.com/document/d/1zKkDpBWbb3NnQhlX34LEWuuZxy5Rotre5uuuBMTxvxY/edit#heading=h.92upd2fec7b1

The 3 core genes mentioned in the issue title (ECH1, UQCRFS1, SARS2,) are tightly linked in ORF and CRISPR; the other genes shown for ORf and CRISPR are non-overlapping so we will probably drop them.

The 3 core genes are relatively known to be involved with each other (KG scores .446, .638) except ECH1 and SARS2 less so with KG .343 so that would be the one potential 'discovery' here.

This site shows the two genes are the top nearest neighbours of each other based on cell line RNA expression: https://www.proteinatlas.org/ENSG00000104823-ECH1/cell+line

It's possible some quick searches might reveal other links from other data sources.

The 3 genes are mitochondrial-related so this relationship seems supported by extra info and could be presented as a brief story, just needs synthesis/writing up.

jgaetz-plex commented 6 days ago

Plex analysis of the 52 genes most similar to this cluster identifies multiple connections to mitochondrial function. (Plex search). Note that this link is to a search of human+mouse+rat orthologs, so the displayed number of searched items is >52.

jgaetz-plex commented 6 days ago

Of the 52 genes most similar to this cluster in our ORF dataset, the majority are mitochondria-associated (GOCC_Mitochondrion), including 16 mitochondrial disease-associated genes (Ochoa et al. 2023). Knockdown or overexpression of LINC00473, a regulator of lipolysis and mitochondrial respiration, was shown to downregulate 15 of these genes (Tran et al. 2020), and knockdown of the mitochondrial chaperone PHB2 resulted in down-regulation of 14 of this 52 gene cluster (Liu et al 2017). Additionally, in proteomic profiling data from cells treated with a library of 875 compounds, 5 of the 10 profiles with the greatest overlap with this 52 gene cluster were from inhibitors of the PI3K/MTOR pathway (Mitchell et al. 2023), which regulates both mitochondrial function and biogenesis (Morita et al. 2017). @AnneCarpenter

AnneCarpenter commented 6 days ago

Great, here is the draft text put into the main paper accordingly. @niranjchandrasekaran ready for you to finalize figures, etc.

ECH1, UQCRFS1, SARS2 cluster and are implicated in mitochondrial function and cancer We found that the three enzymes strongly correlated in both the ORF and CRISPR profiles: ECH1 (enoyl-CoA hydratase 1), UQCRFS1 (ubiquinol-cytochrome c reductase, Rieske iron-sulfur polypeptide 1), and SARS2 (seryl-tRNA synthetase 2, mitochondrial) (Fig Xx). The connections among them are not well-known; UQCRFS1 and SARS2 are connected in the knowledge graph via databases, not by literature reports, and the remaining connections show weak or low knowledge graph scores (Fig Xx). Some existing data supports these new connections: SARS2 is the most highly correlated gene with ECH1 in terms of cell line RNA expression; SARS2 and ECH1 are the 5th and 10th top matches respectively for UQCRFS1 (https://www.proteinatlas.org/ENSG00000104823-ECH1/cell+line). Analyzing the top 52 genes most similar to this cluster in the ORF dataset using the Plex web application, the majority are mitochondria-associated (GOCC_Mitochondrion), including 16 mitochondrial disease-associated genes (Ochoa et al. 2023). Knockdown or overexpression of LINC00473, a regulator of lipolysis and mitochondrial respiration, downregulates 15 of these genes (Tran et al. 2020), and knockdown of the mitochondrial chaperone PHB2 downregulates 14 genes in the cluster (Liu et al 2017). Additionally, in proteomic profiling data from cells treated with a library of 875 compounds, 5 of the 10 profiles with the greatest overlap with this 52 gene cluster were from inhibitors of the PI3K/MTOR pathway (Mitchell et al. 2023), which regulates both mitochondrial function and biogenesis (Morita et al. 2017). Given recent interest in UQCRFS1 as a mitochondrial-related oncology biomarker and drug target (Sun et al. 2023), SARS2 and ECH1 merit attention as well.