Closed AnneCarpenter closed 3 months ago
I am not sure if this is still seen in the Evotec results, but there is a strong correlation between NAT14 and RAB30 in ORF (0.7
)
This connection is novel.
The heatmap shows the percentile of the cosine similarities (1 → similar, 0 → anti-similar). The text is the maximum of the absolute KG score (gene_mf__go
, gene_bp_go
, gene_pathway
). I set a KG threshold (like we previously had) of 0.4. If connections have a score lesser than this threshold, then the connection is considered to be unknown. The KG scores were downloaded from Google Drive: ORF and CRISPR. The diagonal of the heatmap indicates whether a gene has a phenotype (False
could also mean the gene is not present in the dataset).
It looks like the next step is to email researchers working on these; I asked Holger at Evotec to do so but he never replied so the thread was lost. Can you recap the story, suitable for pasting into an email?
I think it's something like this: "Overexpression of these two genes yields morph profiles that strongly correlate, and a relationship between the genes is unknown. We do not see a morph impact of either gene when knocked down by CRISPR."
Is that all there is to say? @niranjchandrasekaran
Hi @AnneCarpenter - I just searched both of these genes in many databases and came to the same conclusions as you. RAB30 seems fairly well characterized, while NAT14 has only predicted function. The top papers related to NAT14 are either generic functional genomics studies where it was one hit out of many, or papers that rule out NAT14 for particular functional roles within pathways.
I think that your succinct summary above captures the situation. Are we still planning on reaching out to researchers?
I don't think there is time to do so now, unfortunately, but this could still be a nice story for the paper.
I think @niranjchandrasekaran would need to make sure the relationship still holds in the latest data first, then run the analysis to show what features are key.
I think if looking at the images themselves (guided by what features are key) may be enough of a story if it's a visible phenotype. The fact that Golgi seems involved indicates it may be visible in Cell Painting!
Ok, sounds good. I will find images of these two perturbations, and wait for @niranjchandrasekaran to extract key features.
Great - Alán's tools will help you get images, though the colors will be merged which may not suit the goal.
So it's not clear to me that there is a visible phenotype from the images, but maybe someone else with more experience looking at images can see something. Here are RAB30 and NAT14, along with controls from the same plate:
awesome, let's see what Niranj's list of features tells us so we know what to look at.
ps @jessica-ewald would it be useful to note how you retrieved these images in case others do the same?
I can add the notebook to the repo with a pull request. I used Alan's library to retrieve the images, then wrote my own function to rescale/display them.
Notebook with function here: https://github.com/broadinstitute/2023_12_JUMP_data_only_vignettes/blob/main/notebooks/display_orf_images.ipynb
Here are the list of top features that are significantly different between NAT14, RAB30 and the negative controls: sorted by p values of NAT14 features and sorted by p values of RAB30 features.
Here is the list of all features that are significantly different for both genes.
Note: I removed ObjectNumber
, Location (features with X
and Y
in their names) and correlation features. I have also removed features which measure similar things.
I also looked at the similarity of features, grouped by feature groups, compartment and channels, between the two clusters in ORFs. The similarity is low for AreaShape features. All other feature groups are similar.
I couldn't open the list files, it wouldn't let me decompress them which is surprising.
But overall based on the chart, this set of features looks fishy - very little change in Area/shape but then all the channels and all the texture/intensity features affected. I wonder what is going on and maybe looking at the features will make it more clear, esp the list of all features that are significantly different for both genes.
@niranjchandrasekaran can you please remind me what the plot shows? You said "similarity of features" (would that be correlations?) But I wonder if the plot might instead be the "list of all features that are significantly different for both genes" categorized (but then I'm not sure what the numerical value is).
can you please remind me what the plot shows? You said "similarity of features" (would that be correlations?) But I wonder if the plot might instead be the "list of all features that are significantly different for both genes" categorized (but then I'm not sure what the numerical value is).
The way I do it is to create mini profiles using only those features in each group/compartment/channel and then find the cosine similarity between these profiles for the two genes.
Thanks! In checkin Niranj noted that these two genes are nearby (same column, same plate) so it does point to a technical artifact. Next step is Jess will check if other similar genes are also nearby (or generally look at similarity to these two genes in a plate layout view). It may rule out this cluster but also point to some technical issue that needs filtering.
This isn't looking good. These are the top genes that are most similar to both RAB30 and NAT14. They are all either on the same plate or in the same batch. Also, many of them are in the same column. I think this story might be a dead end. I need to check the other “novel” connections to ensure the connections are not explained by layout.
Metadata_Symbol | Metadata_Plate | Metadata_Well | Metadata_Batch |
---|---|---|---|
CLDN3 | BR00123947 | I01 | 2021_06_07_Batch5 |
SRI | BR00123947 | O01 | 2021_06_07_Batch5 |
SOCS2 | BR00123947 | M01 | 2021_06_07_Batch5 |
ALMS1P1 | BR00123947 | C01 | 2021_06_07_Batch5 |
PPCDC | BR00123947 | A01 | 2021_06_07_Batch5 |
RAB30 | BR00123947 | G01 | 2021_06_07_Batch5 |
NAT14 | BR00123947 | E01 | 2021_06_07_Batch5 |
TREML2 | BR00123952 | I01 | 2021_06_07_Batch5 |
IL26 | BR00123952 | C13 | 2021_06_07_Batch5 |
IL26 | BR00123952 | C01 | 2021_06_07_Batch5 |
ASPDH | BR00123952 | G01 | 2021_06_07_Batch5 |
CRYGS | BR00123957 | C02 | 2021_06_07_Batch5 |
TM2D2 | BR00123957 | C01 | 2021_06_07_Batch5 |
CEP104 | BR00123957 | G01 | 2021_06_07_Batch5 |
Really glad we caught it! If we are lucky this just means a single plate needs to be thrown out or something. I wonder the best way to get an overview of how things look after all the steps/batch correction we did. Didn't @alxndrkalinin look at some plate layouts early on (not sure whether it was ORFs, CRISPRs or compounds)? If so Alex can you point to code you used to get an overview of features in plate layout format?
I think I would've recommended that we look at something like cell count, cell size, and then some random thing like cytoplasm mito intensity in plate layout view for every plate in a given dataset, just laying them all out to get a view.
So idk what's going on here, but when I pull location info for all genes using jump_portrait, it says that RAB30 and NAT14 are in the same physical wells:
This can't possibly be right - what does the metadata that @niranjchandrasekaran have say? I see the two wells in his screenshot above, but this further confuses me because in the jump_portrait metadata there are many wells per gene.
Yep, here's the code for plotting feature values across plate layout, plus example notebook showing such plots, and plots that we generated for ORF data.
@jessica-ewald I suspect there is some kind of mapping error in jump-portrait. I checked the contents of the wells you shared in your screenshot. All wells that don't contain NAT14
or RAB30
seem to be negative controls.
Metadata_Plate | Metadata_Well | Metadata_Symbol |
---|---|---|
BR00123947 | A21 | LUCIFERASE |
BR00123947 | C05 | BFP |
BR00123947 | C08 | BFP |
BR00123947 | E01 | NAT14 |
BR00123947 | E02 | LacZ |
BR00123947 | E07 | HcRed |
BR00123947 | E14 | LUCIFERASE |
BR00123947 | F22 | LacZ |
BR00123947 | G01 | RAB30 |
BR00123947 | J05 | LUCIFERASE |
BR00123947 | J11 | BFP |
@niranjchandrasekaran we figured it out. There was an extra flag in the function call that I didn't know to use to retrieve the perturbation instead of the negative controls matched to the perturbation of interest.
So - this connection still seems like it could be explained by well position, but at least there isn't something totally wrong going on here 😁
I think I would've recommended that we look at something like cell count, cell size, and then some random thing like cytoplasm mito intensity in plate layout view for every plate in a given dataset, just laying them all out to get a view.
@AnneCarpenter Erin previously ran this: https://github.com/jump-cellpainting/morphmap/issues/6#issuecomment-2136035514
Ok - I'm going to drop pursuing this story from my list. Let me know if there are any other action items for me!
Thanks, Niranj, that link to Erin's issue helps. Here are the plate views for an intensity metric (Meanintensity AGP in cells) for the plate of interest here - at the top left (BR00123947). The first column has all of the X01 wells. Honestly I was hoping this plate would be some obvious terrible outlier but that doesn't seem to be the case. Note it's a bit hard to tell in these plots because each plate has its own scalebar - most seem to have most samples between + and - 2 and most have the same pattern of lower values in the middle/lower part of the plate than upper/sides.
Now, I don't know exactly what stage of profiles these are (the issue says before sphering and harmony - that's the only way we can get feature names) but I guess if the profiles we're looking at in these plate layouts is AFTER plate layout correction then it's not great that a relatively subtle plate layout like this yields something so misleading in the connections between compounds.
I'm trying to recall and need @alxndrkalinin help - didn't we attempt a thing where we tried to mean-average each well position across all plates in the experiment? That would have gotten rid of this pattern but I don't know if that ended up in final profiles and/or if that step happens after sphering/harmony (I don't think that is the case).
In the meantime, it's pretty clear this pairing of compounds is artifactual, based on Niranj finding the nearby wells all rank similarly highly to each other.
I think we should include a warning about this in the paper... I guess as a supplemental figure and a warning to check for any pairing if it can be explained by proximity in wells/plates/batches? I mean, ideally we would fix the data so this never happens, but since that's not practical I think all we can do is offer a way for people to check if this is what is happening. @niranjchandrasekaran could you write a sentence or two in the paper and a pointer to a supp figure suggesting the steps to avoid getting fooled?
I'm trying to recall and need @alxndrkalinin help - didn't we attempt a thing where we tried to mean-average each well position across all plates in the experiment? That would have gotten rid of this pattern but I don't know if that ended up in final profiles and/or if that step happens after sphering/harmony (I don't think that is the case).
Alex, correct me if I am wrong. We do subtract the mean feature value of each well position from each feature, and that is the first step along the current profile processing pipeline (for ORF and CRISPR).
I think we should include a warning about this in the paper... I guess as a supplemental figure and a warning to check for any pairing if it can be explained by proximity in wells/plates/batches? I mean, ideally we would fix the data so this never happens, but since that's not practical I think all we can do is offer a way for people to check if this is what is happening. @niranjchandrasekaran could you write a sentence or two in the paper and a pointer to a supp figure suggesting the steps to avoid getting fooled?
Will do.
Closing this issue as we won't include this in the manuscript.
I'm trying to recall and need @alxndrkalinin help - didn't we attempt a thing where we tried to mean-average each well position across all plates in the experiment? That would have gotten rid of this pattern but I don't know if that ended up in final profiles and/or if that step happens after sphering/harmony (I don't think that is the case).
Alex, correct me if I am wrong. We do subtract the mean feature value of each well position from each feature, and that is the first step along the current profile processing pipeline (for ORF and CRISPR).
We did implement this step, and it was first, but that was before @johnarevalo's optimization of all preprocessing steps. I'm not sure if it made it into the final version.
Yes, it was included in the ORF and CRISPR pipeline. It is not part of the COMPOUND pipeline.
If the parquet file was produced with the Snakemake implementation, then the filename should have the wellpos
string.
Strong plate layout effects.
@jessica-ewald IIUC we can close this ?
I think this is already closed - we just had extra comments afterwards.
(Anne) RAB30-NAT14 Decided not to pursue bc it was in Evotec’s hit list before doing much normalization Emailed Evoted Nov 20. 2023 for permission to share w others
No papers with both genes mentioned Looked up NAT14 papers and chose the 2 from past decade (see email) RAB30 has more papers (20 in the past decade) Only one senior author has 2 papers in past decade: Nakagawa I. A second lab has the only paper that jumps out as cell bio/molec (vs genetics or a review paper) RAB30 regulates PI4KB (phosphatidylinositol 4-kinase beta)-dependent autophagy against group A Streptococcus. Nakajima K, Nozawa T, Minowa-Nozawa A, Toh H, Yamada S, Aikawa C, Nakagawa I. Autophagy. 2019 Mar;15(3):466-477. doi: 10.1080/15548627.2018.1532260. Epub 2018 Oct 18. PMID: 30290718 Free PMC article. Here, we elucidate a novel property of RAB30: the ability to recruit PI4KB (phosphatidylinositol 4-kinase beta) to the Golgi apparatus and GcAVs. ...Furthermore, we identify an interaction between RAB30 and PI4KB, in which the knockdown of RAB30 decreased the … A role for Rab30 in retrograde trafficking and maintenance of endosome-TGN organization. Zulkefli KL, Mahmoud IS, Williamson NA, Gosavi PK, Houghton FJ, Gleeson PA. Exp Cell Res. 2021 Feb 15;399(2):112442. doi: 10.1016/j.yexcr.2020.112442. Epub 2021 Jan 5. PMID: 33359467 Rab30 is a poorly characterized small GTPase. Here we show that Rab30 is localised primarily to the TGN and recycling endosomes in a range of cell types, including primary neurons; minor levels of Rab30 were also detected throughout the Golgi stack and early …