SATVILab / PipelineAnalysisACS

Analysis functions for pipeline for ACS-related analyses.
Other
0 stars 0 forks source link

Correct shrinkage for models involving offset #9

Closed MiguelRodo closed 2 years ago

MiguelRodo commented 2 years ago
MiguelRodo commented 2 years ago

This commit on the targets branch neatened the winsorise function (partly just to change how the variables were winsorised), but worth noting the neatening. It was not used by AnalysisACSCyTOFNKBCells, though

It DOES FIX the offset-related issue (well, the code aims to, anyway!).

MiguelRodo commented 2 years ago

Done in PipelineAnalysisACS v0.2.0.9002.

MiguelRodo commented 2 years ago

Note that I changed the y shrinkage from 2 to 3, i.e. increased it. I think that before I had used 2 because it wasn't doing much, but that likely because the offsets weren't being accounted for.

MiguelRodo commented 2 years ago

We need to check that this is correct for the bin and betabin models, as I'm not sure the offset is treated in the same way. I think it may simply be already divided into the response.

MiguelRodo commented 2 years ago

In iter_tbl, var_offset is set to "none" when either the binomial or beta-binomial models are used:

image

So therefore this was never a problem for those models.

MiguelRodo commented 2 years ago

Okay, so I also need to select what the multiplier should be for the sd/mad when shrinking.

MiguelRodo commented 2 years ago

So I'll need to do this for the following:

MiguelRodo commented 2 years ago

Response variables

faust_cyt, some pheno and combn, summed

Before:

image

After:

image

faust_cyt, cd4 single gamma, summed

Before:

image

After:

image

faust_cyt, cd4 triple positive

Before:

image

After:

image

faust_cyt, cd4 IL2+TNF+

Before:

image

After:

image

MiguelRodo commented 2 years ago

The primary issue is with the ones where we should've been logging, and it looks jolly reasonable to me, even for the IL2+TNF+ setting.

MiguelRodo commented 2 years ago

C9

Before:

image

After:

image

MiguelRodo commented 2 years ago

There are not great changes for C9 or MMP1 when we use 0.025 and 0.975 as the percentiles (but they look kind of better).

RISK6

Before:

image

After:

image

MiguelRodo commented 2 years ago

Kind of looks okay for RISK6 as well, if a bit harsher than for the others.

No problem!

I think we'll go with using an sd/mad mult of 3 for the responses, and winsorise with 0.025 and 0.95 for the inflammation markers.

So I'll need to code in that change now.

MiguelRodo commented 2 years ago

The investigating code, btw, to generate the above was as follows:

PipelineAnalysisACS package

quantile(DataTidyACSRISK6::data_tidy_risk6$risk6, c(0.9, 0.95, 0.975, 0.99, 0.995, 1))
quantile(DataTidyACSRISK6::data_tidy_risk6$risk6, rev(1 - c(0.9, 0.95, 0.975, 0.99, 0.995, 1)))

max_val <- quantile(DataTidyACSRISK6::data_tidy_risk6$risk6, 0.975)
min_val <- quantile(DataTidyACSRISK6::data_tidy_risk6$risk6, 0.025)
library(ggplot2)

p <- ggplot(
  DataTidyACSRISK6::data_tidy_risk6,
  aes(y = risk6)
) +
  geom_boxplot()

cowplot::save_plot(
  filename = "p.png",
  p
)

p_wins <- ggplot(
  DataTidyACSRISK6::data_tidy_risk6 |>
    dplyr::mutate(
      risk6 = pmax(pmin(risk6, max_val), min_val)
    ),
  aes(y = risk6)
) +
  geom_boxplot()

cowplot::save_plot(
  filename = "p_wins.png",
  p_wins
)

AnalysisACSCyTOFTCells

Ran the following when debug = "preprocess" in the faust_cyt.Rmd file:

p <- ggplot(data_raw, aes(x = Progressor, y = MMP1)) +
  geom_boxplot()

cowplot::save_plot(
  filename = "p.png",
  p
)

data_raw_wins <- winsorise(
  data_raw = data_raw,
  wins = "wins_x",
  p_dots = p_dots,
  iter = iter
)

p_wins <- ggplot(data_raw_wins, aes(x = Progressor, y = MMP1)) +
  geom_boxplot()

cowplot::save_plot(
  filename = "p_wins.png",
  p_wins
)

UtilsDataRSV::view_cols(
  data_raw
)

data_raw[1, c("pop", "pheno", "combn")]

quantile(data_raw$C9, c(0.9, 0.95, 0.975, 0.99, 0.995, 1))
quantile(data_raw$C9, rev(1 - c(0.9, 0.95, 0.975, 0.99, 0.995, 1)))

quantile(data_raw$RISK6, c(0.9, 0.95, 0.975, 0.99, 0.995, 1))
quantile(data_raw$RISK6, rev(1 - c(0.9, 0.95, 0.975, 0.99, 0.995, 1)))
MiguelRodo commented 2 years ago

Resolved in 6cff83d25ac5fac986ec9d7cc032e773655babca.