broadinstitute / 2023_12_JUMP_data_only_vignettes

Collection of JUMP documentation and projects for internal and public consumption
0 stars 0 forks source link

MYT1-RNF41 exploration for MorphMap paper (ORF) #3

Open afermg opened 9 months ago

afermg commented 9 months ago

From: Anne Carpenter anne@broadinstitute.org Date: Fri, Dec 1, 2023 at 2:44 PM

Hi Ardigen/Ksilink, Using this attached plot from Ardigen of the top 25 most anti-correlated pairs, I saw MYT1-RNF41.

So, I found someone who studies MYT1 who is willing to collaborate. We told her that MYT1 has the 'opposite' profile to RNF41 and she looked in some unpublished data of hers and she sees that MYT1 (a repressor) binds to the promoter of RNF41! Neat! (please keep that confidential)

She may design further followups to add to MorphMap. I'm not sure it's necessary but it may help to have more info about MYT1:

How feasible is it to generate a list of top-10 genes that correlate/anti-correlate to MYT1 (with the correlation values so we see the strength)? This was from the CRISPR data, but it looks like we tested the gene in ORFs too so we could create both sets of lists.

We'd also like to tell her what is the morphology diff for MYT1 vs neg controls, but I think we will start by pulling up a few images on our side, because that is a hard Q to answer via informatics (we wrote a blog post of steps but it's fairly unsatisfying/time-consuming).

Screenshot 2023-12-01 at 2 33 44 PM

From: Anne Carpenter anne@broadinstitute.org Date: Tue, Dec 5, 2023 at 11:52 AM

I'm adding Alán on our team who has been working on this connection specifically. He pulled images for us but we didn't see anything dramatic by eye (there is a LOT of variation from one replicate to the next, which makes this harder and is discouraging but I guess it explains batch effects we see in the data!)

Alán found the attached relationships at the replicate level in CRISPR data (Harmony-corrected). We expected a negative correlation between the two genes, and it looks like we see that for 2 replicates of RNF41 but not all of them. That is rather odd, as is the fact that a single replicate of the MYT1s is anti-correlated with ALL of the RNF41.

I'm guessing this pair of genes was in the top-25 of Ardigen's "negative correlation" list because after averaging (median, I think) it was still a strong relationship, but still it's a little surprising.

Ardigen, can you confirm you see a similar thing with replicates? I believe you are using the exact same Harmony-corrected profiles.

I guess this evidence is strong enough to continue working on this pair even though a few replicates were not consistent... Ksilink have you seen anything odd going on between replicates in the data in general?

I imagine this could either be:

  1. bad batch effects that weren't corrected enough.
  2. batch effects that were corrected too much, in a way that introduced errors.
  3. some replicates (maybe some whole batches) just didn't CRISPR-knock-down effectively for technical reasons.

I welcome any thoughts anyone has on the matter! I will proceed to meet with the scientist in ~24 hours by zoom to see if she has some experiments to run, or if we can include the unpublished data she has already.

correlations_pearson_harmony_myt1_rnf42


Edit by Niranj: Code for generating the following list: https://github.com/jump-cellpainting/morphmap/blob/c3393f985cb0a2c1a906ca8438f105eb785ce4de/12.explore-correlations-anticorrelations/4.explore-myt1.ipynb

From: Niranj Chandrasekaran csriniva@broadinstitute.org Date: Wed, Dec 6, 2023 at 6:45 PM

I created the list of top correlated and anti correlated genes in the ORF dataset. RNF41 appears at the top. I have not checked whether the results look different at the replicate level or not.

Top correlated

Metadata_broad_sample_1 Metadata_broad_sample_2 Cosine_similarity Gene_1 Gene_2
ccsbBroad304_10986 ccsbBroad304_04392 0.65285605 MYT1 LZTS2
ccsbBroad304_10986 ccsbBroad304_01581 0.62539047 MYT1 SOX15
ccsbBroad304_10986 ccsbBroad304_10195 0.59890395 MYT1 LGALS9C
ccsbBroad304_10986 ccsbBroad304_09189 0.57654506 MYT1 LZTS2
ccsbBroad304_10986 ccsbBroad304_04393 0.55127573 MYT1 LCOR
ccsbBroad304_10986 ccsbBroad304_11168 0.53719974 MYT1 TACC1
ccsbBroad304_10986 ccsbBroad304_13816 0.5334892 MYT1 ASCL2
ccsbBroad304_10986 ccsbBroad304_15450 0.529622 MYT1 HNRNPC
ccsbBroad304_10986 ccsbBroad304_01085 0.529373 MYT1 NEUROD1
ccsbBroad304_10986 ccsbBroad304_04658 0.52150875 MYT1 ERMAP

Top anti-correlated

Metadata_broad_sample_1 Metadata_broad_sample_2 Cosine_similarity Gene_1 Gene_2
ccsbBroad304_10986 ccsbBroad304_02346 -0.6578171 MYT1 RNF41
ccsbBroad304_10986 ccsbBroad304_11513 -0.5148919 MYT1 DDX17
ccsbBroad304_10986 ccsbBroad304_00946 -0.512445 MYT1 LMNA
ccsbBroad304_10986 ccsbBroad304_10479 -0.4869117 MYT1 FAM81A
ccsbBroad304_10986 ccsbBroad304_04413 -0.4827406 MYT1 TRIM55
ccsbBroad304_10986 ccsbBroad304_04404 -0.460314 MYT1 TUBB6
ccsbBroad304_10986 ccsbBroad304_08471 -0.4582101 MYT1 AGGF1
ccsbBroad304_10986 ccsbBroad304_04931 -0.4466559 MYT1 FOXR2
ccsbBroad304_10986 ccsbBroad304_04798 -0.4455448 MYT1 KRT40
ccsbBroad304_10986 ccsbBroad304_16059 -0.444838 MYT1 CCDC74A

From: Anne Carpenter anne@broadinstitute.org Date: Thu, Dec 7, 2023 at 10:04 AM

Oh, awesome! I didn't even dare hope they were both existing in the ORF dataset.

So now these two genes are anti-correlated in both CRISPR and ORF data; that is a nice story (and makes me much less worried about the inconsistent replicates in CRISPR).

FYI, I've started to write up this section of the paper in the MorphMap draft at the end of the results here. I talked to the researcher (Sui Wang) and she says she can have a lab member make constructs to do a reporter assay to add to this section in addition to the promoter-binding unpublished results I mentioned before.

GIven how clean a story this is, I think for the paper we ought to make a full list of gene pairs that have this behavior: anti-correlating in both CRISPR and ORF datasets. This is the strongest evidence for them regulating each other and would be a nice finding (with one such pair, MYT1+RNF41, confirmed).

afermg commented 9 months ago

Providing the email log that led to this collaboration (skipping my email to her): Sui’s email to Anne:

We mainly study the function of Myt1 in neurons (previously in pancreas) in mice in vivo. As far as I know, the connection between Myt1 and RNF41/Insyn1 is novel.

I have just checked our Myt1 Cut&RUN data (unpublished data) using mouse retinas. Myt1 binds to the promoter of rnf41, indicating a direct regulation, but doesn’t bind to the Insyn1 gene. Myt1 mainly functions as a transcriptional repressor. .. My email to Sui after our chat Dec 6:

Please do share our results & plan with your lab member who might like to work on this project. I am happy that our data uncovered something interesting and that it can benefit you also.

The plan is that you will send us: 1- Cut & run track data for the particular gene (not the full dataset, because you will be publishing that later; if our reviewers complain about this, hopefully it will be middle of 2024 and perhaps you will be getting ready to preprint that data by then anyway) 2- Reporter results from 293 cells (you hope they behave well so that we don’t need to do the experiment in U2OS where we originally found the gene-gene relationship). 3- Author information for the researchers who should be added to the paper for sharing the cut & run data and the reporter data.

We will continue to work on:

Our 1st big paper (nickname “MorphMap") will be written in this doc here (https://docs.google.com/document/d/160QYCeJXMJMPgvYcirkkKaY99r6dZq6FMqYGE61CfFc/edit#) so you can see where your results will go (towards the end, as demonstration that findings from our data are useful!) If your results are ready by end of January it would be ideal to put in this 1st paper. If we are ready to submit but the reporter results aren’t, we can decide whether to wait for them or put the whole story in our 2nd big paper about this dataset. .. Her reply: Nice to talk with you😊

I will send you the figure ASAP.

Your plan sounds great to me. .. My response Update! Attached are the genes that most correlate / anticorrelate with MYT1 overexpression (the results we talked about so far are CRISPR knockout), in terms of morphology.

You can see RNF41 is the top anti-correlating gene here, too, so that is very nice confirmation!

The rest of the list may interest you in general, as these would be likely potential important genes in these pathways. We already have enough for a nice story for MYT1-RNF41 but let me know if you see anything else interesting in here for future work and feel free to follow it up.

niranjchandrasekaran commented 9 months ago

@zahrahanifehlou shared the following

I have created a list of genes that are top correlated and anti-correlated with MYT1 and RFN41 in CRISPR. Please find the list in the attached file.

gene_corr_list.xlsx

to which I replied

Thank you for sharing the list. I took a look at it and found that RNF41 was not in MYT1's list of anti correlated genes. Could it be because this list was generated using all genes and not just replicable genes? I ask because for the ORF dataset, I only kept the replicable genes for generating the list (this was my code) and for the correlation heatmaps.

Anne, since only replicable genes are in the heatmap, it perhaps further alleviates our concern about the replicates not looking alike in the CRISPR data (assuming the heatmap was generated using the same code as that of the ORF data).

Zahra replied

In the CSV file that was sent, I didn't consider replicability when searching for correlated and anticorrelated profiles. If we consider replicability with "q_value==True", MYT1 and RNF41 are not replicable. And if we consider "p_value==True", only MYT1 is replicable. In the "MYT1.nlargest.csv" and "MYT1.nsmallest.csv" files, you can find crispr profiles that are correlated/anticorrelated (with MYT1) and replicable ( "p_value==True").

Additionally, I computed similarity across all replicates (without median aggregation), and the results are available in "final.csv".

Also on this plot you can see similarity between MYT1 and RNF41 in different run(replicate): image

MYT1.xlsx

niranjchandrasekaran commented 9 months ago

Hi @zahrahanifehlou, thanks for sharing these. Your heatmap replicates the heatmap that @afermg had previously shared, which is good. But I had a question about your following statement

If we consider replicability with "q_value==True", MYT1 and RNF41 are not replicable. And if we consider "p_value==True", only MYT1 is replicable.

I assumed that when Tomasz generated the top n pairs heatmap in the first comment in this issue, that was after filtering out non-replicable genes. Is that not true? I guess this is also a question for @tjetkaARD.

AnneCarpenter commented 9 months ago

Here is the "track file" from our collaborator Dr Sui Wang showing Myt1 binding to RNF41's promoter! unnamed

AnneCarpenter commented 9 months ago

STATUS: this project is awaiting reporter assay results from the Wang lab, although what we have already is a nice result on its own

Here is my summary from the manuscript main text (moving everything here now to track better)

Myt1 has the opposite impact as RNF41 (& to a lesser extent, INSYN1) in CRISPR data Emailed suiwang@stanford.edu https://mail.google.com/mail/u/0/#sent/QgrcJHsHnNjJtWszNdwGctjDfRPknnWjVHl Because of their R01 grant that is current, on Myt1 IN PROGRESS - we met Dec 6! The plan is that Sui will send us: DONE - 1- Cut & run track data (not the full dataset, because you will be publishing that later; if our reviewers complain about this, hopefully it will be middle of 2024 and perhaps you will be getting ready to preprint that data by then anyway) 2- Reporter results from 293 cells (we hope they behave well so that we don’t need to do the experiment in U2OS where we originally found the relationship). 3- Author information for the researchers who should be added to the paper for sharing the cut & run data.

We will continue to work on:

AnneCarpenter commented 9 months ago

Should doublecheck the story with the info provided here now https://github.com/broadinstitute/2023_12_JUMP_data_only_vignettes/issues/8

tjetkaARD commented 8 months ago

As mentioned here: https://github.com/broadinstitute/2023_12_JUMP_data_only_vignettes/issues/8 assignment of significant anti-correlation between MYT1 and RNF41 in CRISPR datasets was a mistaken on my side originiating from ORF data.

Nonetheless, there are two other evidence to fill in this story at least from molecular perspective:

  1. There are only two studies with raw data for gene perturbations of MYT1: https://doi.org/10.1002/jcb.26636 and https://doi.org/10.1038/s41388-020-1268-6

  2. The correlation of CRISPR effects as measured by pooled CRISPR and viability screens shows negative (-15%) anti-correlation between effects of RNF41 and MYT1 (in cancer cell lines, DepMap data). They are both within top 150 anti-correlated genes for each other.

image

3.There is almost no correlation within co-expression datasets between both of them.
image

AnneCarpenter commented 8 months ago

Just clarifying here: if I understand, these two genes DO anti-correlate in JUMP ORF and DO NOT correlate in JUMP CRISPR, do I have it correct?

And are we sure that both genes "have a phenotype" (ie are above our threshold for being distinct from negative control)?

tjetkaARD commented 8 months ago

JUMP-ORF: anti-correlation; coefficient: -0.65 (cosine similarity)


JUMP-CRISPR: insignificant result; coefficient: 0.10 (cosine similarity)

AnneCarpenter commented 8 months ago

Thank you! It seems the remaining tasks here are:

auranic commented 7 months ago

Of note, in GNN KG scores, this connection looks quite unknown:

image

tjetkaARD commented 7 months ago

Just one note about the MYT1 gene name (I have just almost fell into this trap, so adding the note ;). There are two separate genes/protein that are being named "MYT1":

  1. the currently official: MYT1 from CRISPR/ORF dataset, MYT1 = Myelin transcription factor 1
  2. the currently official: PKMYT1, Protein kinase, membrane associated tyrosine/threonine 1, whose previous symbol was MYT1.

The second one is quite hot drug target right now; often in the literature the two names are used interchangeably.

niranjchandrasekaran commented 4 months ago

connections

Here is what the results look like in the most recent version of ORF and CRISPR profiles

ORF MYT1 RNF41 INSYN1
MYT1 1 -0.56 -0.3
RNF41 1 0.4
INSYN1 1

CRISPR

The KG scores are between -0.21 and 0.3 for these connections, which fall in the unknown connections category.

Here are the cell images of the three genes

RNF41: https://phenaid.ardigen.com/static-jumpcpexplorer/images/source_4/BR00121547/C03_4.jpg MYT1: https://phenaid.ardigen.com/static-jumpcpexplorer/images/source_4/BR00121562/I08_3.jpg INSYN1: https://phenaid.ardigen.com/static-jumpcpexplorer/images/source_4/BR00124794/K03_3.jpg

niranjchandrasekaran commented 3 months ago

Notebook

The heatmap shows the percentile of the cosine similarities (1 → similar, 0 → anti-similar). The text is the maximum of the absolute KG score (gene_mf__go, gene_bp_go, gene_pathway). I set a KG threshold (like we previously had) of 0.4. If connections have a score lesser than this threshold, then the connection is considered to be unknown. The KG scores were downloaded from Google Drive: ORF and CRISPR. The diagonal of the heatmap indicates whether a gene has a phenotype (False could also mean the gene is not present in the dataset).

The results are the same as what I had in the above comment: https://github.com/broadinstitute/2023_12_JUMP_data_only_vignettes/issues/3#issuecomment-2105235343

ORF

ORF-connections-INSYN1-MYT1-RNF41

CRISPR

CRISPR-connections-INSYN1-MYT1-RNF41

AnneCarpenter commented 3 months ago

Here it seems the remaining steps are:

Niranj, please let ppl know if you want them to do any of the above.

AnneCarpenter commented 2 months ago

Update from Sui Wang: the reporter results were inconclusive, but used a mouse version of Myt1 so we will not include these results in the paper, leaving it with just the cut & Run results. So the biological followup part is done. In more detail, here's a summary of the results:

"It is a human RNF41 promoter driving GFP expression. We overexpressed mouse Myt1 (we do not have human Myt1 cDNA). No repression of GFP was detected by overexpressing Myt1. In fact, GFP intensity seemed to be increased."

Some brainstorms and responses from Sui: 1) Maybe the overexpressed mouse Myt1 binds up the human Myt1 (Q: do they dimerize?) and makes slightly less effective repressors" Sui says: "I believe they do not dimerize. It is totally possible that the mouse Myt1 doesn’t bind human sequences well enough. Usually, TF bindings are conservative. We did not see rnf41 expression in the retina (So, we may not want to put lots of efforts on this). "

2) Maybe overexpressed mouse Myt1 binds to the promoter, displacing a slightly more effective human Myt1 repressor (Q: this depends on whether you think it’s possible that the mouse Myt1 could be a bit less effective a repressor in human cells, and assuming your experiment was in human cells?) Sui says: "We used HEK293T cells. These cells should not express Myt1 endogenously (We can double check)."

3) what if, in the ORF Cell Painting experiments, overexpressing the Myt1 repressor was dominant negative? But inhibiting this repressor should cause RNF41 expression to increase - but that really should make the two genes matches, not anti-matches. So I don’t think this is plausible unless overexpressing RNF41 also produces a dominant negative phenotype. One could consider even more complex explanations - that these two genes are doing what we expect, but they also impact other genes that then have feedback loops or whatever, such that nothing follows logic :D Sui says: "I agree. The regulation could be much complicated than we thought."

niranjchandrasekaran commented 2 months ago

Notebook

@AnneCarpenter

figuring out why the replicates are inconsistent. This is a concern we noted above - if it was just for CRISPR data we can perhaps ignore but if we were worried about ORFs we could plot the replicates individually to get a sense of whether there's a concern.

In ORF, the replicates are consistent.

MYT1-RNF41_similarities

figuring out what is the morphology change that makes the genes look the ‘opposite’ of each other. Looking by eye, we didn’t see a dramatic change but this is usually the case. See notes above for a recommendation to just do a simple version of this.

Here is how the feature group analysis looks like

MYT1-RNF41_area_size_compartment

MYT1-RNF41_feature_group_channel

These are cosine similarity values for the consensus profiles of MYT1 and RNF41. It looks like the opposite signature is present in all compartments and channels across all feature groups.

Note: These results are from profiles that have gone through the following processing steps: wellpos_cc_var_mad_outlier.

AnneCarpenter commented 1 month ago

@niranjchandrasekaran I went to write (update) this section of the paper but am confused because only ORF info is shown in the last comment so I wonder if the CRISPR info earlier in the thread is still correct (that is, uses the right profiles) and therefore are we confident there's no signal in CRISPR data ( that is, this is still true: "Both MYT1 and RNF41 have a phenotype, but the cosine similarity is only 0.03").

I'm also confused because the ORF correlation values for the individual replicates is around -0.3 but the May 10 table says -0.56 for gene-level correlation in ORFs. Is that consistent? I see the ORF plot is part d of what is currently called Figure 8 but the numbers are KG values and not correlations so that doesn't address this Q.

niranjchandrasekaran commented 1 month ago

that is, this is still true: "Both MYT1 and RNF41 have a phenotype, but the cosine similarity is only 0.03")

Yes, can confirm.

I'm also confused because the ORF correlation values for the individual replicates is around -0.3 but the May 10 table says -0.56 for gene-level correlation in ORFs. Is that consistent?

Yes, still consistent.

niranjchandrasekaran commented 4 weeks ago

Notebook

This cluster is not affected by plate layout.

ORF-plate-layout-MYT1-INSYN1-RNF41