Open GWMcElfresh opened 3 months ago
Recording this beyond the ID meeting:
Greg's current heurstic thresholds are based on a strategy of 1. get the counts slot -> 2. asinh() -> 3. CLR().
I think this solves some problems and causes others, but once we're pretty satisfied, this flag should probably default to TRUE.
Recording this beyond the ID meeting:
Greg's current heurstic thresholds are based on a strategy of 1. get the counts slot -> 2. asinh() -> 3. CLR().
I think this solves some problems and causes others, but once we're pretty satisfied, this flag should probably default to TRUE.
Based on review of failure cases, my current preference is to not apply asinh() transformation to the data.
yeah, I'm on board for the internal workings.
Upstream of this function, what are we thinking about cell typing prior to ADT quality checking (the first two boxes of the checklist above)?
We could either bake in some Seurat processing into this (e.g. provide an optional whitelist of cell types and subset) or not.
I can't precisely recall, but did the RIRA objects you were using to test include bulk cells or just T/NKs?
yeah, I'm on board for the internal workings.
Upstream of this function, what are we thinking about cell typing prior to ADT quality checking (the first two boxes of the checklist above)?
We could either bake in some Seurat processing into this (e.g. provide an optional whitelist of cell types and subset) or not.
I can't precisely recall, but did the RIRA objects you were using to test include bulk cells or just T/NKs?
Just T/NKs.
Hi all,
This PR builds off of @gboggy2 's recent work with triaging ADTs and builds up a basic skeleton for how we might get at this.
The core of this function is using
multimode:locmodes()
to find modes and antimodes within ADTs, but as Greg has identified, it's more sensitive to mixture distributions than we'd like.It follows up with an autocorrelation-based approach, exploiting the higher level of symmetry of unimodal distributions as you lag them compared to multimodal distributionsThis function doesn't yet do any classification, but it implements (most of) Greg's triaging. As is, it returns a dataframe with a boat-load of distribution summary statistics so we can get a sense of what's happening at scale.
I ran this on a mixed cell type RIRA, with what I think was an ADT filter applied (518154). There were a few failure cases with
locmodes()
where it would return a mode and some NAs. Greg, these failures broke the for loop where you were incrementing the mod0 parameter to find more and more modes, so I left that as a TODO for you to do battle with.I did some basic infrastructure overhauling to make this run faster/parallelizable, but once I moved around some Seurat manipulation it was fast enough on full RIRA.
These are all using Greg's original thresholds on what I think was RIRA T_NK. The thresholds may need to react to the ADT they're working on (or not?)
TODO:
for the 2nd TODO: it may be that we enforce a hierarchy like:
triageList = list("TNK" = c("CD4", "CD8"), "Macrophage" = c("CD68"), "Bcell" = c("CD20"))
and then parse the tree:Here are some plots that I thought might be helpful and accompanying ggplot code: