Closed MiguelRodo closed 2 years ago
This commit on the targets
branch neatened the winsorise
function (partly just to change how the variables were winsorised), but worth noting the neatening. It was not used by AnalysisACSCyTOFNKBCells
, though
It DOES FIX the offset-related issue (well, the code aims to, anyway!).
Done in PipelineAnalysisACS
v0.2.0.9002
.
Note that I changed the y shrinkage from 2 to 3, i.e. increased it. I think that before I had used 2 because it wasn't doing much, but that likely because the offsets weren't being accounted for.
We need to check that this is correct for the bin
and betabin
models, as I'm not sure the offset is treated in the same way. I think it may simply be already divided into the response.
In iter_tbl
, var_offset
is set to "none"
when either the binomial or beta-binomial models are used:
So therefore this was never a problem for those models.
Okay, so I also need to select what the multiplier should be for the sd/mad when shrinking.
So I'll need to do this for the following:
Before:
After:
Before:
After:
The primary issue is with the ones where we should've been logging, and it looks jolly reasonable to me, even for the IL2+TNF+ setting.
There are not great changes for C9 or MMP1 when we use 0.025 and 0.975 as the percentiles (but they look kind of better).
Kind of looks okay for RISK6 as well, if a bit harsher than for the others.
No problem!
I think we'll go with using an sd/mad mult of 3 for the responses, and winsorise with 0.025 and 0.95 for the inflammation markers.
So I'll need to code in that change now.
The investigating code, btw, to generate the above was as follows:
quantile(DataTidyACSRISK6::data_tidy_risk6$risk6, c(0.9, 0.95, 0.975, 0.99, 0.995, 1))
quantile(DataTidyACSRISK6::data_tidy_risk6$risk6, rev(1 - c(0.9, 0.95, 0.975, 0.99, 0.995, 1)))
max_val <- quantile(DataTidyACSRISK6::data_tidy_risk6$risk6, 0.975)
min_val <- quantile(DataTidyACSRISK6::data_tidy_risk6$risk6, 0.025)
library(ggplot2)
p <- ggplot(
DataTidyACSRISK6::data_tidy_risk6,
aes(y = risk6)
) +
geom_boxplot()
cowplot::save_plot(
filename = "p.png",
p
)
p_wins <- ggplot(
DataTidyACSRISK6::data_tidy_risk6 |>
dplyr::mutate(
risk6 = pmax(pmin(risk6, max_val), min_val)
),
aes(y = risk6)
) +
geom_boxplot()
cowplot::save_plot(
filename = "p_wins.png",
p_wins
)
Ran the following when debug = "preprocess"
in the faust_cyt.Rmd file:
p <- ggplot(data_raw, aes(x = Progressor, y = MMP1)) +
geom_boxplot()
cowplot::save_plot(
filename = "p.png",
p
)
data_raw_wins <- winsorise(
data_raw = data_raw,
wins = "wins_x",
p_dots = p_dots,
iter = iter
)
p_wins <- ggplot(data_raw_wins, aes(x = Progressor, y = MMP1)) +
geom_boxplot()
cowplot::save_plot(
filename = "p_wins.png",
p_wins
)
UtilsDataRSV::view_cols(
data_raw
)
data_raw[1, c("pop", "pheno", "combn")]
quantile(data_raw$C9, c(0.9, 0.95, 0.975, 0.99, 0.995, 1))
quantile(data_raw$C9, rev(1 - c(0.9, 0.95, 0.975, 0.99, 0.995, 1)))
quantile(data_raw$RISK6, c(0.9, 0.95, 0.975, 0.99, 0.995, 1))
quantile(data_raw$RISK6, rev(1 - c(0.9, 0.95, 0.975, 0.99, 0.995, 1)))
Resolved in 6cff83d25ac5fac986ec9d7cc032e773655babca.