Nov 2020 Discussions (associations with znf436)

shntnu commented 3 years ago

From: Arora, Jatin jarora1@bwh.harvard.edu

Hello Anne, and everyone, May I ask for a little more guidance? You might remember that rare variant burden in znf436 was associated with multiple morphology traits. Znf436 is a component of MAPK pathways, and negatively regulates response to external stimuli cell proliferation. I am trying to make sense of two associations of znf436:Nuclei’s granularity measured from Brightfield and Cell Areashape Zernike 5_1. I have attached a slide showing associations and well images. (a) I am thinking that disruption of znf436’s function would decrease its regulatory effect in MAPK, leading to higher proliferation. Brightfield might not be much informative, but would it be safe to say that Nuclei’s granularity from Brightfield captures transcriptional activity? (b) Zernike 5_1 is an high order measurement, and might not be detectable with eyes in images (please correct me if am wrong). Given the two associations, does it make sense to say that, due to higher proliferation, cells squeeze to accommodate neighbors, which resulted into lower Cell Areashape Zernike 5_1.

PastedGraphic-2

From: Beth Cimini bcimini@broadinstitute.org

I'm not sure we can realistically say either of those things with any high degree of confidence; a lot of things might change granularity, particularly in the brightfield channel, and if the cells are more "squeezed" I would feel much more comfortable saying that based on something like area or axis length rather than a hard-to-interpret Zernike.

From: Anne Carpenter anne@broadinstitute.org

I agree with what Beth said. It's worth noting that we have seen strong signatures of MAPK pathways in past experiments, when overexpressing genes (rather than mutating as in your project). So that is not a surprise and is reassuring - you can search our attached prior paper for MAPK and read more. From a big picture, when the cells are clearly so crowded as they are I would say the dominant phenotype is proliferation and any morphology changes probably have to do with cells being crowded rather than an inherent phenotype. But indeed, the Zernike metric that scored is about the cells being almost-round with just one side of the cell not being perfectly filled out to the edge of the circle. I do think this is a cohesive sensible story for this gene! It's not a very morphologically complex/unique story but at least it is sensible.

From: Ralda Nehme rnehme@broadinstitute.org

Do you see an association when examining only single, isolated cells (i.e. ones with no neighbors)? If the phenotype is related to density (cells being squeezed because of being crowded) then you wouldn't expect to see it in single/isolated cells. Otherwise, it could be "inherent".

From: Arora, Jatin jarora1@bwh.harvard.edu

Thanks so much everyone for your knowledge, and also for the paper on mapk.

In intermediate cells, rare variant burden in znf36 is also associated with higher Cell AreaShape_extent (plot attached), which I guess is also about roundness of the cells, as we saw in microscopy images. As you also pointed out, this could be related to mitotic cell rounding process, i.e. proliferating cells bring more round, and disruption of znf436 leads to proliferation.I agree this is not a very novel and unique association, and i would highlight this as a support that our pipeline is able to capture genetics relevant for morphology.
In isolate cells, znf436 is nominally associated with Cells_AreaShape_Zernike_6_4 (plot attached), but not significant in permutation test. However, there is a significant positive association between rare variant burden in kckn6 and Nuclei AreaShape Zernike 2_2. To be more intuitive, kckn6 is positively associated with Nuclei AreaShape Eccentricity as well, which means disruption of kckn6 causes cell to be less round.Kckn6 is a potassium channel protein, whose opening or closing should affect cell membrane shape. And am not sure if this would also affect nucleus, which should be relatively rigid.

PastedGraphic-4

shntnu commented 3 years ago

@jatinarora-upmc asked

My specific doubt/question is about znf436 which is associated with cell solidity and nuclei’s granularity. If you look at the image of cell lines having variant in znf436 (the bottom one in red circle), there are a lot more cells than wild type (upper one). Now given that cell count is measured after plating, fixing and staining AND it was not enough time for the cell to proliferate (i guess so), am confused whether the observed higher count is due to disrupted znf436 (which makes biological sense) or the number of platted cells at beginning (for which we don’t have information) .

shntnu commented 3 years ago

Now given that cell count is measured after plating, fixing and staining AND it was not enough time for the cell to proliferate (i guess so), am confused whether the observed higher count is due to disrupted znf436 (which makes biological sense) or the number of platted cells at beginning (for which we don’t have information) .

@jatinarora-upmc I don't recollect that there was enough time for the cells to proliferate (@mtegtmey might be able to confirm)

If there wasn't enough time, the cell counts you have in the _count.csv files should be directly proportional to the plating density.

jatinarora-upmc commented 3 years ago

@shntnu i am summarizing few points (sorry for redundancy):

i got to know from Emily that the number of seeded cells at beginning were similar (~10k per well)
6hrs of time between seeding and fixing for the cells to proliferate, if they can
cell count is measured from 8/9 images per well after seeding, fixing and staining Overall, I guess the the cell count in images reflect how well the cells survived different steps (seeding, washing, staining etc.) + intrinsic cell line characteristics (e.g. loss of znf436 would not regulate MAPK and enhance cell proliferation or loss of some other gene would lead to apoptosis). So, it is difficult (may be not possible with given dataset) to disentangle the technical differences from biological differences.

Does it make sense?

shntnu commented 3 years ago

3. So, it is difficult (may be not possible with given dataset) to disentangle the technical differences

Can you make use of the fact that we have 8 (technical) replicates of each cell line?

jatinarora-upmc commented 3 years ago

@shntnu oh yeah, that's a good things to remember. I guess if we have high cell count across all replicates for a cell line, then it is more likely to that cell line's intrinsic property. right?

shntnu commented 3 years ago

@jatinarora-upmc I

@shntnu oh yeah, that's a good things to remember. I guess if we have high cell count across all replicates for a cell line, then it is more likely to that cell line's intrinsic property. right?

I would think so. However it is possible that all replicates are all plated at the same time, which could drive their cell counts to be similar (because they all had the same amount of time to grow), so one may not be able to fully untangle this. Worth checking with Emily (she isn't on GitHub)

mtegtmey commented 3 years ago

@jatinarora-upmc my hunch would be that the higher cell counts per replicate are more closely linked to the cell count at plating, as opposed to a cell line specific phenotype (i.e. proliferation). It would be very tough to untangle this without combining a proliferation screen to compare with.

jatinarora-upmc commented 3 years ago

@shntnu @mtegtmey these plots show the number of cells imaged per replicate for cell lines having and not having rare variants certain genes. there are few observations here:

for ZNF436 gene, replicates of a given cell line (same color dots in variant == 1) does not differ much in the number of cells. Overall, cell count is higher for cell lines carrying variants in ZNF436 gene, whose disruption would lead to higher cell proliferation. This suggests that intrinsic characteristics of cell lines are more likely to create the difference in cell count compared to technical differences. Also cell lines having variant were imaged on different plates.

in the comparison of cell count across cell lines having and not having rare variants in other genes (WASF2, WTIP) related to F-actin formation and cytoskeleton organisation, the difference in average cell count does not differ as much as for ZNF436 gene, which regulates cell proliferation. So things fall in line well.

jatinarora-upmc commented 3 years ago

@shntnu @raldanehme @AnneCarpenter @bethac07 @mtegtmey tagging you here for follow up on znf436's associations.

In the last v2f call, we wondered whether the observed associations of rare variant burden in znf436 with Cells_AreaShape_Zernike_5_1 (effect size = -1.44) and Cells_AreaShape_Solidity (effect size = -1.05) in intermediate cells (1 to 3 neighbors) were confounded by differential cell count per cell line. So I looked at znf436's associations in isolate cells (0 neighbors), and it is quite associated with cell area & shape properties there as well (less than just p < 0.05). Here are a few those associations.

This lends a strong support that znf436's associations with cell area & shape in intermediate cell are not confounded. Q1: Agreed? Q2: I know zernike higher order contains less information, but am confused about what the difference between zernike_0_0 and higher orders (6_4 or 3_1) would be? I am trying to understand the opposite effect of znf436's disruption on these traits

AnneCarpenter commented 3 years ago

Q1: it's a great sign that you still see the association even in isolated cells. I think if there are tons of cells, the isolated cells are still affected in the sense that they don't have much place to go without touching so if they don't want to touch others they might stay pretty rounded. In other words, cells don't need to physically touch each other to still be influenced by them.

Q2: Here is a guide to the Zernikes: https://en.wikipedia.org/wiki/File:Zernike_polynomials2.png Zernike0_0 should honestly have almost perfect correlation with one of the more commonly named shape metrics because it's really asking whether the cell matches a circle shape. For 3_1 you look at that pyramid for the one that says Z with a 1 on top and a 3 on the bottom (I think). You can see it has a red and blue stripe at the edges, and a red and blue blob in the middle. What this means: picture the shape of the cell superimposed on top… it will score high for this Zernike the more blue is covered and the more red you see - our cells aren't allowed to have holes in them, so i can imagine two cell shapes that would score highly: one is almost a perfect circle but just a little flattened at the red side. The other would be almost a crescent such that the middle red blob is exposed (but it’s not a great fit because a big chunk wouldn’t align well). 6_4 isn’t shown but you can follow the right hand side of the pyramid and see it would be mostly a circle with wiggly edges (probably not far off from a circle!). I'm a bit surprised that they'd be anticorrelated to 0_0, really.

jatinarora-upmc commented 3 years ago

@AnneCarpenter this was a great explanation of zernike moments, and really helped me. thanks so much ! Q2: so cells become more round (higher zernike 0_0 and less 3_1 and 6_4) with rare variant burden in znf436. Zernike_0_0 is not positively correlated with all other moments. Indeed, zernike 0_0 has correlation > 0 with those having values like 2_2, 3_3, 6_6 etc. and correlation < 0 with other zernike moments like 3_1, 8_6 etc.

shntnu commented 3 years ago

@jatinarora-upmc also see this note https://github.com/broadinstitute/cmQTL/issues/32#issuecomment-648410790 which has some more intuitions about Zernike

In https://github.com/broadinstitute/cmQTL/issues/44#issuecomment-648384595 we see that both 6_4 and 3_1 have good replicate correlation (i.e. repeated measurements of the same features in the same cell line tend to be similar, across the experiment), so you're in good shape here.

Below, we see that 0_0 is correlated with other Area_Shape features (when looking at the well-level averaging of profiles); Extent is the easiest to explain

Extent: The proportion of the pixels (2D) or voxels (3D) in the bounding box that are also in the region. Computed as the area/volume of the object divided by the area/volume of the bounding box.

Rplot

library(tidyverse)
library(magrittr)

df <- read_csv("1.profile-cell-lines/profiles/BR00106708_augmented.csv")

df1 <- 
  df %>% 
  select(matches("Cells_AreaShape")) %>%
  select(!matches("Cells_AreaShape_Center_Z")) %>%
  select(!matches("Cells_AreaShape_Z"))

df2 <- 
  df %>% 
  select(matches("Cells_AreaShape_Z"))

df3 <- 
  bind_cols(df2 %>% 
              select(Cells_AreaShape_Zernike_0_0), 
            df1)

names(df3) <- str_remove_all(names(df3), "Cells_AreaShape_")

df3 %<>% 
  pivot_longer(-Zernike_0_0)

ggplot(df3, aes(Zernike_0_0, value)) + geom_hex() + facet_wrap(~name, scales = "free_y")

shntnu commented 3 years ago

Also, I confirm that those 3 Zernike features are indeed (anti) correlated but I don't have an intuition as to why

df4 <- 
  df %>% 
  select(Cells_AreaShape_Zernike_0_0,
         Cells_AreaShape_Zernike_3_1,
         Cells_AreaShape_Zernike_6_4)

GGally::ggpairs(df4)

jatinarora-upmc commented 3 years ago

@shntnu oh that thread about cell zernike with soumya and you is even pasted in my notebook. It was really helpful in seeding the concept of morphology in my mind. Thanks much for tagging it here :)

nice, i usually take features with replicate correlation > 0.5, and great that 6_4 and 3_1 were there.

so, it seems the concern about znf436's associations being confounded by cell count is significantly lesser now. I would also cross-check them by taking into account proliferation time as well.

broadinstitute / cmQTL

Nov 2020 Discussions (associations with znf436) #63