broadinstitute / jump_hub

Collection of JUMP documentation and projects for internal and public consumption
2 stars 1 forks source link

Find Evotec gene connections to pursue: exploration for MorphMap paper Evotec; [ORFs only] OR [CRISPRs only] #11

Closed tjetkaARD closed 2 weeks ago

tjetkaARD commented 11 months ago

Here, I report the heatmap for Top similarity and top anti-similarity pairs without known evidence behind it.

Procedure:

  1. Select top 20 similar pairs according to ORF profiles cosine similarity (from excel file)
  2. Select top 20 anti-similar pairs according to ORF profiles cosine similarity (from excel file)
  3. Remove a pair if it has (Knowledge Graph average score above 0.5) OR (at least one Knowledge Graph above 0.8)

orfs_heatmap_cosine_Unknown Top_Anti 20_labels Code: The value in the square indicate average Evotec KG score.

Unfortunately, I am unable to recreate similar plot for CRISPRs without data from Evotec (data exists only for pairs intersected with ORFs). Alternatively, I can recreate using STRINGdb Knowledge Graph, but it will not be replicable.

Edit: I updated the plot with the updated data from Evotec KG. Major change: there was a change in estimate of known association between HOOK2 and NDE1/NDEL1/ PAFAH1B1 - hence it was removed.

AnneCarpenter commented 10 months ago

Ok, what looks worth pursuing from the ORFs-only plot are the following:

AnneCarpenter commented 10 months ago

(perhaps @tjetkaARD can add the CRISPRs-only plot to this issue and adjust its title, since the ORFs were a simple story split to other issues now, and i referred to this spot being where the CRISPR-only will go!)

tjetkaARD commented 10 months ago

@AnneCarpenter

  1. yes, let's do it here. For CRISPR, I would need additional KG data for the gene pairs from the file, @auranic : crispr-cp-replicable-top-correlated.csv Thanks a lot!

  2. After the KG update, one major change for ORFs in HOOK2/NDE1 cluster, reported in details in https://github.com/broadinstitute/2023_12_JUMP_data_only_vignettes/issues/5

AnneCarpenter commented 10 months ago

For your first question I just forwarded an email with the needed information. This was my oversight in not sending it earlier, so sorry!

For the 2nd question, I think what you're saying is we previously thought this was a novel discovery but now it seems the connection among these genes is well known? Can we move that discussion over to #5 ?

auranic commented 10 months ago

@tjetkaARD Please find the KG scores for the CRISPR gene pairs : https://drive.google.com/drive/folders/1QWY8itTMeR3pGIt2kWIS5NiOPLhazg4S?usp=sharing (let me know if you need all of them from your top list)

For 2 - I answered here https://github.com/broadinstitute/2023_12_JUMP_data_only_vignettes/issues/5#issuecomment-1901559303 )

auranic commented 10 months ago

For an overview of the ORF links to pursue, I would like to attract your attention to this slide :

image

It should somehow match the heatmaps above

it should be self-explainable but I will be happy to provide more info. The network file is here https://drive.google.com/file/d/16J5D4Wiuh-r2IuT3hA8grAQUggwS8Rc7/view?usp=sharing

tjetkaARD commented 10 months ago

Here, I report the heatmap for Top similarity and top anti-similarity pairs of CRISPR similarity without known evidence behind it (as defined by Knowledge Graph).

Procedure:

Heatmap

crispr_heatmap_cosine_Unknown Top_Anti 10_labels

Code: The value in the square indicate average Evotec KG score.

Clusters

  1. KAT5 and ZSCAN9 - Histone acetyltransferase and Zinc finger. They are both co-located in Nucleus. In general, there are scarce data on ZSCAN9 in general. Nonetheless IntAct database indicate some evidence behind physical interaction between the two in yeast (https://www.ebi.ac.uk/intact/search?query=ENSG00000137185). They are both low-level / DNA-related expression regulators. Similar phenotypic profiles could in fact indicate common mechanism behind it. There are however nothing to start with in this respect.

  2. ABCC10, MYH13, SRD5A1, GRK5, TBXA2R anti-correlated versus POLR2A, PSMD14, MDM2, USPL1, NXF1, EIF4A3, VCP

    • The second gene cluster is responsible of cellular and protein metabolic process; proteolysis - includes polymerase, proteasome, translational process proteins.
    • The first gene cluster is much less specified. TBXA2R, MYH13, SRD5A1 are involved in response to extracellular stimuli (not very sensitive). Only, link between GRK5 and TBXA2R is somehow known. ABCC10 is Multidrug resistance-associated protein , transporter; TBXA2R is platelet aggregation receptor, while GRK a GPCR kinase.
  3. CHST8, APOE, DYRK1B, LAIR1, NLRP9, KLK15, SLC17A7, CYP2A7, ECH1, SLC1A5, LIPE, LILRB4 anti-correlated versus DLX5, ZNF689

auranic commented 10 months ago

@tjetkaARD I wonder why in the heatmap there are remaining question marks? In the CRISRP there must be these pairs (eg, I have just manually checked GRK5-ECH1 pair, small similarity score, not connected in KG)

tjetkaARD commented 10 months ago

@auranic - yes indeed, under optimal procedure on my side. I will correct and update it in the spare time.

AnneCarpenter commented 10 months ago

We realized the CRISPR data needs chromosome arm correction, so will re-make the heatmaps (but the top correlation relationships aren't likely to change much, we hope)