Closed GWMcElfresh closed 6 months ago
While I coded this up to support pairwise tests, I suspect this will mostly be used for non-parametric anova. This could stand to have a companion function that does do a post hoc test for specific pairs of treatmentField
. I think the Conover-Iman test from conover.test
is probably fine for this. It's is on CRAN and will compute all pairwise tests.
edit: I updated the function to iterate through each cluster that's significant by the Kruskal-Wallis test. It will print a tile plot (with columns & rows populated with treatmentField
variables).
Right now, the color is determined by the p-value, but I think plotting the T statistic like the author does in (updated below)conover.test()
is more helpful
This function now, if a there are multiple levels per cluster, will run a Kruskal-Wallis test followed by a Conover-Iman post hoc test if the Kruskal-Wallis was significant.
The Conover-Iman will populate T statistics and p values for each cluster and print a plot like the one below to help interpret which elements of treatmentField
the cluster is enriched for. Here, it's (approximately) RhCMV-TB/9Ag < RhCMV-TB/6Ag = IV-BCG < Unvax < Intradermal BCGs = RhCMV/Gag.
You'll end up getting a lot of these plots (at least I did with cohorts of n = 16), but this adds a lot of oomph to "this cluster is important because of how biased the clusters on the UMAP looks"-based approaches.
Cluster enrichment: Clusters with any enrichment:
I think this implementation (printing cluster enrichment plots only when post-hoc test is required) strikes the right the balance between returning a seurat object versus showing the cluster enrichment plots (although, this can take awhile to print the highlighted DimPlot). The alternative is returning a non-seurat object though, which could nuke a user's Seurat object accidentally. You'd want something simple (p values for each cluster) if your Seurat object only contained two levels of treatmentField
.
As long as the colorspace
import works fine, I think this is good to go.
I also just added a check that ensures the characters " - ", which are used to determine Group1 and Group2 in the post-hoc test, throw an error if they appear in treatmentField
and ask the user to change it.
Hi everyone,
I saw an approach of using non-parametric tests for proportions that I think strikes the right balance for "enrichment" that we normally mean colloquially. The crux of the idea is that
SubjectId
is sampled independently of clustering (mostly true, though subject specific recovery violates this a little), so by testing vaccine 1 proportions against vaccine 2 proportions, we can test the two vectors of proportions for "treatment/vaccine" differences.I'll edit this post with some plots as I add themedit: plots added in comment below.Best regards, GW