A2-ai / gilead_ghqc_demo

0 stars 0 forks source link

scripts/validate_mrgsolve_model.qmd #2

Open jenna-a2ai opened 1 day ago

jenna-a2ai commented 1 day ago

mrgsolve Model Validation

Note: Please modify the checklist items to insert relevant QC context.

1. Prepare needed R objects

2. Generate exposure predictions from mrgsolve

3. Combine exposure predictions from mrgsolve and NONMEM

NONMEM predictions of the same values

nm_preds <- nm_output |> filter(EVID == 0) |> pivot_longer(cols = c(IPRED, CL, VC, Q1, VP1, Q2, VP2)) |> mutate(SOURCE = "NONMEM") |> select(USUBJID, TIME, SOURCE, PARAM = name, value)

Merge mrgsolve and NONMEM predictions

mrg_test <- bind_rows(mrg_test1, nm_preds) |> pivot_wider(id_cols = c(USUBJID, TIME, PARAM), names_from = SOURCE)

In lines 4 and 11, substitute in whatever parameter values are relevant in this scenario. Make sure they are included in the `$CAPTURE` code block of your mrgsolve model specification! When this code is run, `mrg_test` will have columns USUBJID, TIME, PARAM, mrgsolve, and NONMEM. The column PARAM will have the value IPRED, CL, VC, Q1, VP1, Q2, or VP2, the column mrgsolve will have the value of that parameter as predicted by mrgsolve, and the column NONMEM will have the value that parameter as predicted by NONMEM.

:bell: "TIME" from NONMEM output may not precisely match the times given in the source dataset, since NONMEM output typically has less precision than we use while saving our derived datasets. If you are finding that `mrg_test1` and `nm_preds` have rows that cannot be matched on USUBJID and TIME, you can try retrieving the original TIME values from the source data and adding them to `nm_output`, or perhaps using line number (LINE or NUM) as the ID variable for rows instead.️️

### 4. Graphically examine output

- [ ] Graphically examine output.
```R
# Plot the comparison
ggplot(data = mrg_test, aes(x = NONMEM, y = mrgsolve)) +
  geom_abline(aes(color = "red", slope = 1, intercept = 0), 
              key_glyph = "abline") +
  geom_point(size = 0.7) +
  facet_wrap(vars(PARAM), scales = "free") +
  theme(legend.position = "bottom",
        legend.direction = "horizontal") +
  scale_color_identity(guide = "legend", name = NULL,
                       breaks = c("red"),
                       labels = c("y = x reference line"))

:x: If any figures have any points visibly deviating from the line y = x, validation is not complete until these differences are eliminated (or verified to be harmless and documented in the validation file).

If all of the figures look wrong, you may not be matching the right subjects, or you might not be pulling model parameter estimates from the NONMEM output correctly.

If most of the PK parameters are falling on that line, but one or two of them are not (e.g., VC, Q1, VP1, Q2, and VP2 all look great, but CL and IPRED are all over the place - IPRED presumably being affected by CL), then your problem may be a covariate or ETA value that affect a PK parameter and are not constructed correctly in validation_data.

If all the PK parameters look good but IPRED is inconsistent, then the concentration predictions may not be scaled to the volume in the observation compartment (and/or the correct units). Unlike in NONMEM, there are no reserved terms for scale parameters that are automatically handled; if IPRED is observed in compartment 1, then you must manually specify that IPRED = CENT/S1 (or whatever the appropriate variable names would be).

5. Look at observations with the most extreme differences

mrg_test_ipred |> arrange(desc(ABSDIFF)) |> head() |> flextable()

mrg_test_ipred |> ggplot(aes(y = ABSDIFF)) + geom_boxplot()

Discard records that could have large relative differences but minuscule abs diffs

mrg_test_ipred_reldiffs <- mrg_test_ipred |> filter(mrgsolve > 1E-5 | NONMEM > 1E-5)

mrg_test_ipred_reldiffs |> arrange(desc(PCTDIFF)) |> head() |> flextable()

mrg_test_ipred_reldiffs |> ggplot(aes(y = PCTDIFF)) + geom_boxplot()

When looking at these values, it should be relatively obvious if the differences can be entirely explained by rounding errors or not. NONMEM output is usually converted to scientific notation before rounding, so it won’t necessarily match how R rounds numbers or the model precision specified for mrgsolve.

:x: Relative differences should be < 5%. If mrgsolve is automatically retrieving parameter estimates from NONMEM with high precision and the exact same dataset of doses/observations is used for validation, then relative differences should be < 1%.

If the first few observations are dramatically wrong, but others are not, then there might be an issue with how variables are assigned initial values. There have been several cases where analysts programmed something along these lines:

X = exp(Y) double Y = 0.5;

mrgsolve will allow you to do this without throwing any errors or warnings, but in the first record, X will be defined as though Y = 0 instead of the actual value of Y. If X is a function of Y, then X should be defined in the model specification file after Y.

### 6. Create arbitrary additional observations and plot them

- [ ] Create arbitrary additional observations and plot them.

:bell: mrgsolve can have strange and obscure bugs that are not apparent when only comparing predictions made at the actual PK observation times, but will mangle the exposure predictions you may need for forest plots or ER analysis.

Examples of problems that were not apparent until something like this was done:
- Analyst misunderstood how to program multiple observation compartments (antibody and drug for ADC), and calculated the correct value in observation compartments only for records that had a CMT value corresponding to the NONMEM observation compartment (mrgsolve outputs all compartments at all observations, in wide format)
- The wrong nocb value was applied, so that F1 for each dose was actually the F1 for the previous dose

Try something along these lines:
```R
dose_data <- filter(validation_data, EVID == 1)
new_test <- mrgsim(my_model, 
                   data = dose_data, 
                   tgrid = tgrid(start = 0, end = 500, delta = 0.05),
                   etasrc = "data.all",
                   obsonly = TRUE)

new_test |>
  as.data.frame() |>
  filter(ID %in% c(1:5)) |>
  ggplot(aes(x = TIME, y = IPRED, color = as.factor(ID), group = as.factor(ID))) +
  geom_line()

The output should appear smooth and continuous (except for sharp increases where IV boluses are administered). The relative height of the concentration peaks should make sense given the dose given before each peak.

7. (Optional) Check residual errors.

ggplot(nm_res, aes(x = IRES)) + geom_histogram(aes(y = after_stat(density)), binwidth = 0.1, color = "white") + geom_density(data = mrg_res, color = "blue", linetype = "dashed")


Tweak code snippet as needed to define IRES correctly and give an appropriate bin width.

:x: This diagnostic should end up looking very similar to the `pmtables::res_hist()` family of plots. The histogram and the density curve should both be visible and have similar spread.

## Metadata

* author: jenna-a2ai <jenna@a2-ai.com>
* qc type: mrgsolve Model Validation
* script hash: f2dc8ceb442bf77aee8824d91a11e0f3
* git sha: aced09b5e323d40eeab922726cccb1d06ed00cd2
* file history: https://github.com/A2-ai/gilead_ghqc_demo/commits/main/scripts/validate_mrgsolve_model.qmd