Open hkhawar opened 4 years ago
Whitening approach has somewhat removed batch affects
I think "somewhat" is the key word :) I think that @sMyn42 can improve this with his Harmony implementation as described in #3
@shntnu @gwaygenomics Please have a look at the update whitening results. I am going to use 1b whitening strategy for later analysis
looks good @hkhawar - I recommend writing your code in such a way that we can easily drop in new single cell profiles in the future, for two reasons:
For 2B, we'll want to compare different tools (ZCA spherize, Harmony, unnormalized, etc.) so code should be modular.
@gwaygenomics Here is the code which have been used https://github.com/broadinstitute/2020-06-01-Evidence-of-state-switching-in-single-cell-drug-response-Broad/blob/master/Python_scripts/101420_switching_modes.ipynb
I will work on it so that can usable for single cell profiles in future. We definitely have heterogenous single cell dmso population + batch effects in this data. Definitely batch correction implementation removes batch effect. We need to apply some kind of test how much is the improvement for each method
Sounds great @hkhawar !
We need to apply some kind of test how much is the improvement for each method
Totally agree 💯 - @sMyn42 was also working on a method (kBET) to quantify the impact of batch, which we can apply to all forms of the profiles we generate (unnormalized and different batch corrected data). Alternatively, we can do something similar to what we saw in one of the cytodata presentations yesterday:
@gwaygenomics sounds good.
@hkhawar I wasn't able to get to this sooner, but looking forward to discussing tomorrow.
@shntnu
I have subtracted mean of DMSO cells from the feature matrix and then applying a whitening transformation on it.
Here below is the code
While following below is Juan code which is similar to the one I use
@hkhawar The spherizing (aka whitening) transformation is good.
The issue is that you want to apply this at the aggregate level (well level) and then transfer those per-well transformations to the cells in that well.
So let's say you spherize 24 DMSO wells on a plate.
Now, say, the mean profile of well A01 was p before transformation and p' after transformation.
What I am proposing is that you add the vector (p'-p) to each single cell in A01. And then Repeat for all wells on the plate.
Before implementing, please check with Juan if he has tested this in LUAD already.
Now, say, the mean profile of well A01 was p before transformation and p' after transformation.
What I am proposing is that you add the vector (p'-p) to each single cell in A01.
Interesting! I think this approach sounds great. pycytominer will not be able to handle this specific procedure, but it should be fairly straightforward to implement with a few pycytominer hacks.
@hkhawar - I can add it to #6 if you think it makes sense. What is the plan for next steps?
@shntnu Now I got it what you actually meant. I will ask Juan about it.
@gwaygenomics I am not familiar using Pycytominer profiling pipeline. Is there any guideline to use it?
@shntnu Now I got it what you actually meant. I will ask Juan about it.
Great, feel free to tag me in any conversation if needed, but pretty sure this will work.
@gwaygenomics I am not familiar using Pycytominer profiling pipeline. Is there any guideline to use it?
The documentation is still sparse. I have been working on it on and off for several months, but it still needs more love. I hope that people will be able to contribute to this specifically, at the next jamboree!
There is some minor documentation available here though: https://github.com/cytomining/pycytominer
Also, @hkhawar - once you think this is the right plan, I would be happy to contribute to it in #6 - I estimate that it will take me 1 hour to do.
@gwaygenomics Sounds good to me
@gwaygenomics @shntnu I checked with Juan and he told me that he didn't use this strategy. I have implemented this strategy and here below is the UMAP of single cells profiles from one plate.
Here are the steps which I followed
1) Created Aggregate well-level profiles P 2) Normalize and whitened transformed well-level profiles, which results in P' 3) For each well, computed residual profile which is subtraction between the transformed profile and the original profile: R = P' - P 4) add the residual profile of each well to single cell profiles in that well: single cell profiles + R
This separate out all 24 wells single cell profiles in UMAP space.
Here below are results with or without ZCA sphering
ZCA Whitening
Even the ZCA whitening approach it looks like it is trying to separate wells profiles in UMAP space. Only in unwhitened data we don't see this pattern.
@hkhawar thank you for testing this out! Clearly, that approach produced the least useful result thus far but very insightful.
But I'm a bit confused about the non-spherized data
In https://github.com/broadinstitute/2020-06-01-Evidence-of-state-switching-in-single-cell-drug-response-Broad/issues/2#issuecomment-721280300, you have this plot
In https://github.com/broadinstitute/2020-06-01-Evidence-of-state-switching-in-single-cell-drug-response-Broad/issues/2#issue-701271357, you have this plot:
Should these two look similar (except the colors)?
Oh wait, I think I get it. The first plot is only of a single plate (all 24 wells), correct?
@shntnu I have used a single plate SQ00015201 (red datapoint in the bottom plot). In the top one I used colormap based on wells.
@shntnu Sorry I was wrong actually SQ00015142 are plotted as yellow dot. I will check it why it is so
Thanks for clarifying. The fact that all these 5 have similar structure gives hope that there's a way to eventually align these data.
But to get you moving forward: it might be sufficient to look at just one plate and do all the analysis within the plate (because it appears that the well-position effect may not be as strong as we thought).
(1) Any concerns? (2) Before doing so, we'd want to make sure that the within-plate well position effect is not too large in the DMSO single-cell profiles. How should we decide if it is not too large?
PS – I ran MANOVA on the data and (unsurprisingly) found differences among the DMSO single-cell profiles. This is not terribly useful, but at least confirms that well position effects are present. I spent exactly 5 minutes on this so I can't say this is a good path to go down, but documenting here in case we want to go back to this later.
Thanks for the analysis. I am currently working with harmony corrected data which Greg generated a day ago and we will see if we see any improvement using that approach
I ran a manova script for harmony corrected data and found that differences are still persisted among wells
dang... one question: Is this for one plate, or all five? If all five, then we probably should include a batch
factor in the manova.
One possible explanation: maybe only one (or a few wells) is driving the significance. See the Df = 23 bit? We can do a tukey's posthoc test to figure that out. Also, the approx F is not high... which is good! I do wonder what the F score is like for the other normalization strategies too.
I do something similar to this in broadinstitute/profiling-resistance-mechanisms#91
@gwaygenomics here below are the results from implementing second strategy
UMAP is trained on DMSO single cell and transformation of compound treated single cells in to a DMSO space
Note: DMSO cells are in the last panel. All the conditions looks similar except for differences in cell densities are seen for some conditions
Z-normalization of features is performed and a total of 471 features are selected for downstream analysis
Plate cell densities
UMAP clustering of single cell data clearly show batch affects among replicate plates
Whitening is applied on single cell data set and whitened data is plotted in an unsupervised way in UMAP space.
Whitening strategies
1- ZCA-sphering
1a - ZCA sphering applied to unnormalized features
1b - ZCA sphering applied to z-normalized features
ZCA sphering of z-normalized dataset gave best results in removing batch affects
PCA sphering applied to z-normalized features