Note: Please modify the checklist items to insert relevant QC context.
1. Prepare needed R objects
[ ] Build your mrgsolve model, in this case my_model. Copy parameter estimates from the NONMEM model, preferably through adding an $NMXML code block.
[ ] Construct a validation dataset, in this case validation_data. This should contain the same dosing and observation records that went into the NONMEM model (i.e., exclude records that were flagged to be ignored by the NONMEM model), plus variables for ETA1, ETA2, etc.
[ ] Read in the NONMEM output file with observation-level predictions. Some mild post-processing may be needed to ensure the data will merge properly - for example, adding USUBJID or renaming the time variable to TIME as required by mrgsolve. In this case, the post-processed data frame is called nm_output.
The argument etasrc = "data.all" tells mrgsolve to look for all of the ETAs as columns in validation_data; it will throw an error if it does not find them. The obsonly = TRUE tells mrgsolve to exclude dosing records from the output.
3. Combine exposure predictions from mrgsolve and NONMEM
[ ] Combine exposure predictions from mrgsolve and NONMEM.
In lines 4 and 11, substitute in whatever parameter values are relevant in this scenario. Make sure they are included in the `$CAPTURE` code block of your mrgsolve model specification! When this code is run, `mrg_test` will have columns USUBJID, TIME, PARAM, mrgsolve, and NONMEM. The column PARAM will have the value IPRED, CL, VC, Q1, VP1, Q2, or VP2, the column mrgsolve will have the value of that parameter as predicted by mrgsolve, and the column NONMEM will have the value that parameter as predicted by NONMEM.
:bell: "TIME" from NONMEM output may not precisely match the times given in the source dataset, since NONMEM output typically has less precision than we use while saving our derived datasets. If you are finding that `mrg_test1` and `nm_preds` have rows that cannot be matched on USUBJID and TIME, you can try retrieving the original TIME values from the source data and adding them to `nm_output`, or perhaps using line number (LINE or NUM) as the ID variable for rows instead.️️
### 4. Graphically examine output
- [ ] Graphically examine output.
```R
# Plot the comparison
ggplot(data = mrg_test, aes(x = NONMEM, y = mrgsolve)) +
geom_abline(aes(color = "red", slope = 1, intercept = 0),
key_glyph = "abline") +
geom_point(size = 0.7) +
facet_wrap(vars(PARAM), scales = "free") +
theme(legend.position = "bottom",
legend.direction = "horizontal") +
scale_color_identity(guide = "legend", name = NULL,
breaks = c("red"),
labels = c("y = x reference line"))
:x: If any figures have any points visibly deviating from the line y = x, validation is not complete until these differences are eliminated (or verified to be harmless and documented in the validation file).
If all of the figures look wrong, you may not be matching the right subjects, or you might not be pulling model parameter estimates from the NONMEM output correctly.
If most of the PK parameters are falling on that line, but one or two of them are not (e.g., VC, Q1, VP1, Q2, and VP2 all look great, but CL and IPRED are all over the place - IPRED presumably being affected by CL), then your problem may be a covariate or ETA value that affect a PK parameter and are not constructed correctly in validation_data.
If all the PK parameters look good but IPRED is inconsistent, then the concentration predictions may not be scaled to the volume in the observation compartment (and/or the correct units). Unlike in NONMEM, there are no reserved terms for scale parameters that are automatically handled; if IPRED is observed in compartment 1, then you must manually specify that IPRED = CENT/S1 (or whatever the appropriate variable names would be).
5. Look at observations with the most extreme differences
[ ] Look at observations with the most extreme differences.
When looking at these values, it should be relatively obvious if the differences can be entirely explained by rounding errors or not. NONMEM output is usually converted to scientific notation before rounding, so it won’t necessarily match how R rounds numbers or the model precision specified for mrgsolve.
:x: Relative differences should be < 5%. If mrgsolve is automatically retrieving parameter estimates from NONMEM with high precision and the exact same dataset of doses/observations is used for validation, then relative differences should be < 1%.
If the first few observations are dramatically wrong, but others are not, then there might be an issue with how variables are assigned initial values. There have been several cases where analysts programmed something along these lines:
X = exp(Y)
double Y = 0.5;
mrgsolve will allow you to do this without throwing any errors or warnings, but in the first record, X will be defined as though Y = 0 instead of the actual value of Y. If X is a function of Y, then X should be defined in the model specification file after Y.
### 6. Create arbitrary additional observations and plot them
- [ ] Create arbitrary additional observations and plot them.
:bell: mrgsolve can have strange and obscure bugs that are not apparent when only comparing predictions made at the actual PK observation times, but will mangle the exposure predictions you may need for forest plots or ER analysis.
Examples of problems that were not apparent until something like this was done:
- Analyst misunderstood how to program multiple observation compartments (antibody and drug for ADC), and calculated the correct value in observation compartments only for records that had a CMT value corresponding to the NONMEM observation compartment (mrgsolve outputs all compartments at all observations, in wide format)
- The wrong nocb value was applied, so that F1 for each dose was actually the F1 for the previous dose
Try something along these lines:
```R
dose_data <- filter(validation_data, EVID == 1)
new_test <- mrgsim(my_model,
data = dose_data,
tgrid = tgrid(start = 0, end = 500, delta = 0.05),
etasrc = "data.all",
obsonly = TRUE)
new_test |>
as.data.frame() |>
filter(ID %in% c(1:5)) |>
ggplot(aes(x = TIME, y = IPRED, color = as.factor(ID), group = as.factor(ID))) +
geom_line()
The output should appear smooth and continuous (except for sharp increases where IV boluses are administered). The relative height of the concentration peaks should make sense given the dose given before each peak.
Tweak code snippet as needed to define IRES correctly and give an appropriate bin width.
:x: This diagnostic should end up looking very similar to the `pmtables::res_hist()` family of plots. The histogram and the density curve should both be visible and have similar spread.
## Metadata
* author: jenna-a2ai <jenna@a2-ai.com>
* qc type: mrgsolve Model Validation
* script hash: f2dc8ceb442bf77aee8824d91a11e0f3
* git sha: 3bacb8150e08e60d8fd61485e8e065a2139d4b66
* file history: https://github.com/A2-ai/gilead_ghqc_demo/commits/main/scripts/validate_mrgsolve_model.qmd
mrgsolve Model Validation
Note: Please modify the checklist items to insert relevant QC context.
1. Prepare needed R objects
my_model
. Copy parameter estimates from the NONMEM model, preferably through adding an$NMXML
code block.ETA1
,ETA2
, etc.nm_output
.2. Generate exposure predictions from mrgsolve
The argument
etasrc = "data.all"
tells mrgsolve to look for all of the ETAs as columns in validation_data; it will throw an error if it does not find them. Theobsonly = TRUE
tells mrgsolve to exclude dosing records from the output.3. Combine exposure predictions from mrgsolve and NONMEM
NONMEM predictions of the same values
nm_preds <- nm_output |> filter(EVID == 0) |> pivot_longer(cols = c(IPRED, CL, VC, Q1, VP1, Q2, VP2)) |> mutate(SOURCE = "NONMEM") |> select(USUBJID, TIME, SOURCE, PARAM = name, value)
Merge mrgsolve and NONMEM predictions
mrg_test <- bind_rows(mrg_test1, nm_preds) |> pivot_wider(id_cols = c(USUBJID, TIME, PARAM), names_from = SOURCE)
:x: If any figures have any points visibly deviating from the line y = x, validation is not complete until these differences are eliminated (or verified to be harmless and documented in the validation file).
If all of the figures look wrong, you may not be matching the right subjects, or you might not be pulling model parameter estimates from the NONMEM output correctly.
If most of the PK parameters are falling on that line, but one or two of them are not (e.g., VC, Q1, VP1, Q2, and VP2 all look great, but CL and IPRED are all over the place - IPRED presumably being affected by CL), then your problem may be a covariate or ETA value that affect a PK parameter and are not constructed correctly in
validation_data
.If all the PK parameters look good but IPRED is inconsistent, then the concentration predictions may not be scaled to the volume in the observation compartment (and/or the correct units). Unlike in NONMEM, there are no reserved terms for scale parameters that are automatically handled; if IPRED is observed in compartment 1, then you must manually specify that IPRED = CENT/S1 (or whatever the appropriate variable names would be).
5. Look at observations with the most extreme differences
mrg_test_ipred |> arrange(desc(ABSDIFF)) |> head() |> flextable()
mrg_test_ipred |> ggplot(aes(y = ABSDIFF)) + geom_boxplot()
Discard records that could have large relative differences but minuscule abs diffs
mrg_test_ipred_reldiffs <- mrg_test_ipred |> filter(mrgsolve > 1E-5 | NONMEM > 1E-5)
mrg_test_ipred_reldiffs |> arrange(desc(PCTDIFF)) |> head() |> flextable()
mrg_test_ipred_reldiffs |> ggplot(aes(y = PCTDIFF)) + geom_boxplot()
X = exp(Y) double Y = 0.5;
The output should appear smooth and continuous (except for sharp increases where IV boluses are administered). The relative height of the concentration peaks should make sense given the dose given before each peak.
7. (Optional) Check residual errors.
ggplot(nm_res, aes(x = IRES)) + geom_histogram(aes(y = after_stat(density)), binwidth = 0.1, color = "white") + geom_density(data = mrg_res, color = "blue", linetype = "dashed")