Open jessica-ewald opened 2 months ago
For 3. - there are 4 replicates of every control, so we've already been doing pretty robust ctrl-ctrl comparisons to generate our null distribution. I think this is perfectly fine to continue with! In this case, we could use any allele as a negcon (so long as it is repeated), so we may not need specifically labelled 'negcons'. Having a decent number of repeated controls is important .. maybe its a good idea to include another mislocalization poscon wt-var pair instead of the current negcons?
For 4., there are images of AGP here: https://github.com/broadinstitute/2021_09_01_VarChAMP/issues/23#issue-2388901414. It's not really blurry so much as when the z-plane is close to the bottom of the cell, there is much higher intensity because its where the actin is attaching to the plate (Beth's explanation). A simple filter could be analyzing the median AGP intensity and flagging outliers (w.r.t. the replicates).
Great, for 3, makes sense! It would be nice to keep negcons in the hopes that some day we can detect them. For 4, I see. It really seems there's no choice but to reduce the technical noise introduced by the auto-focusing (in future data collections). Filtering out 'bad' images can only get us so far!
Here are 3 options for new mislocalization positive controls. I selected WT-VAR pairs that are clearly distinguishable, have similar intensity/protein abundance, and where the cells have a high count and look healthy for both WT and VAR.
HPRT1 His204Asp WT: VAR:
GMPPB Asp27His WT: VAR:
RAB33B Lys46Gln WT: VAR:
Possible morphology positive controls To recap, choosing these quantitatively is difficult because the 95th percentile of the ctrl-ctrl morphology AUROC null is 0.99, therefore like 0 or 1 wt-var pairs make it past the threshold. We thought we still might be able to choose some by visually examining images for our wt-var pairs that gave the highest AUROC values.
I plotted images of the top 12 wt-var pairs (mean AUROC > 0.95). After looking at the images, I excluded pairs where there were overall low cell health for both wt and var, or cases where there was an extremely different cell count between wt and var, or cases where I could not visually distinguish the morphology of wt and var. This left me with two wt-var pairs:
GSS Arg125Cys
KHL3 Cys164Phe
Ooof, those honestly seem pretty subtle to me! I can convince myself there is a difference but only if looking back and forth between them, it's not like a glance at an image would immediately make it obvious which category it's in.
If someone else feels they are consistently distinguishable and can verbalize how, then great - but if not I wonder if we should go with a backup plan of looking for samples where cell count is fairly normal but something like nucleus eccentricity (a very easy to see shape measure) is distinctive? Of course we don't have an infinite number of pairs to test so maybe there is no pair that differs strongly in that feature, we could try nucleus or cell area too.
The main difference that I see is that WT cells look rounder, and VAR cells look more spindly. I agree that it's very subtle - was hoping for something much more distinguishable!
I can try targeted features like you suggested.
Eccentricity might be even more visible for cells than for nucleus actually so that may be a good route!
re poscons, I forgot to post when offline traveling:
Oh wow, well 2. GMPPB Asp27His is just spectacular to look at and seems extremely consistent in both versions (REF and VAR). It will rely on good segmentation to be detected since it’s pretty cytoplasmic in both cases, just has the extra plasma membrane that makes it distinct.
The others are more variable: perhaps useful if you want a milder poscon that is closer to the boundary of what we hope to find? I could go either way on that. If you pick a milder one, maybe 1 HPRT1 His204Asp Is a bit better because it’s a bit more consistent well to well.
Thank you Jess!
I like the idea of having a milder poscon closer to our cutoff boundary, could we perhaps drop a negcon to include both a mild and strong localization poscon? How useful do you think this is for analysis?
3 & 4. When you construct the control-control NULL using the negcons, is that the exact same negcon compared to itself (i.e. RHEB vs RHEB) or negcons compared to one another (i.e. RHEB vs. SLIRP)? Although the selection of these negcons were based off CPJUMP, I don't think we can assume these are our best negcons/morphology controls. Does the scrambled controls plate help at all? Do you see consistent morphological differences irrespective of well position across constructs there? If so, we may not have selected the best constructs. What happens if you compare the morphological profiles of all WTs against each other?
Background There are 4 types of controls in the Varchamp data: transfection control, positive control (morphology), positive control (localization), and negative control (morphology). We are revisiting the positive and negative controls because: 1) some variants are not what we thought they were, and 2) controls were chosen based on analyzing JUMP data and we want to confirm that they are actually pos/neg cons based on Varchamp data.
Positive localization control Chloe asked if we have any suggestions from the 1% data. I would suggest taking one of the pathogenic variants that was sequenced and has both a high AUROC and a high confidence sequencing score (for both the WT and the Variant).
Positive and negative morphology controls Chloe asked if I can confirm that SLIRP is an adequate negcon, and to select one of the candidate positive controls to replace PTK2B which came back as something else when sequencing. This is where we have issues. In the current analysis/data state, there is no distinguishable difference between any REF-VAR pair. This is because when we construct the control-control NULL, we get the complete range of AUROC values: from 0 all the way to .99 (maybe even 1.0). Thus, no morphology profile comes back as a hit. We expect that this could be because of the confocal z-plane thing, where random replicates with quite different z-planes throw off the whole analysis. This holds even if we throw away AGP, which suffers from this the most.
What to do