fragla / eq5d

Other
21 stars 4 forks source link

Documentation is unclear on which of the DSU models is being applied in `eq5dmap` #25

Closed DKBurns closed 1 year ago

DKBurns commented 1 year ago

Introduction

Firstly, I'd just like to say thank you for producing a package which applies the NICE DSU mapping. It is appreciated by many people in academia and HEOR.

The issue

However, the documentation surrounding the functions eq5dmap and eq5d require some fleshing out to clarify the methods being applied, as well as the datasets they are being applied to. There are several candidate datasets in the 2020 report (see below) which could all technically be described as "the DSU model". For instance, the help file for eq5dmap has the following description:

Conditional prediction of the utility values of 5L scores onto 3L value sets and 3L scores onto 5L value sets from observed or specified values conditional on age and gender using the NICE Decision Support Unit's models (see NICE DSU's website for more information).

Firstly, this link does not work - the DSU must have changed the location of that page. Perhaps it is the case that the original page at this location did indeed describe the meaning of eq5dmap with the option version set to "5L" and country to "UK". However, there is currently no way to check this, or to refer to the correct matierials this was indended to redirect the analyst.

Secondly, there are four datsets in the "DSU models", whereas the NICE methods guidance suggests using one specifically (See here point 4.3.16). For reference, the specific quote in the NICE methods guidance is:

The mapping function developed by the Decision Support Unit (Hernández Alava et al. 2017), using the 'EEPRU dataset' (Hernández Alava et al. 2020), should be used for reference-case analyses.

Within this article:

Hernandez, M., Pudney, S., Wailoo, A. (2020) The EQ-5D-5L Value Set for England: Findings of a Quality Assurance Program. Value in Health, Preferenced-Based Assessments, 23(5), pp. 642-648, 1 May 2020. DOI: https://doi.org/10.1016/j.jval.2019.10.017

The publication by NICE

and the follow up article:

Hernández Alava, M., Pudney, S. & Wailoo, A. (2022) Estimating the Relationship Between EQ-5D-5L and EQ-5D-3L: Results from a UK Population Study. PharmacoEconomics. https://doi.org/10.1007/s40273-022-01218-7

There are several datasets to consider, including EQG, FORWARD, IP and EEPRU, all of which would produce (slightly) different results. Some additional description in the R package documentation would be extremely useful in ensuring that those using the package understand the process to be undertaken and produce the correctly mapped results for their analyses.

Proposal

Finally, I have managed to replicate the results from the NICE page R examples using the eq5dmap function like so:

Code provided in NICE DSU website

The original code defines a function to calculate the Epan weightings, and then another to apply them. This is then applied in a different file in the zip provided to produce the values below


fun_Epan = function(target, values, bwidth){ # Generate Epan weight
  distance = values - target # Calculate distance from input
  Epan = case_when(abs(distance) >= bwidth ~ 0,     # If outside bandwidth weight = 0
                   TRUE ~ 1 - (distance/bwidth)^2)
  return(Epan)
}

fun_map = function(var_age, var_male, input, df, summary = 1, bwidth = 0, output = "3L"){
  # var_age = Age, var_male = Male, summary = 0/1 for if summary scores are provided
  # Input: either individual domains, or summary score. E.g. 'input = c(1, 3, 5, 1, 5)' or 'input = 0.75'
  # bwidth = bandwidth (caliper) - either zero for exact matches, or > 0.

  if(output == "3L") my_df = df_5 else my_df = df_3   # Check which output df to use

  # Ensure character input and change age and male to match column names of my_df
  var_age = as.character(var_age)
  var_male = as.character(var_male)
  input = as.character(input)
  df = df %>% rename("X_male" = all_of(var_male), "tmp_age" = all_of(var_age)) %>%
    # New code - map age to age-band. Input will either be individual age [16-100] or age-band integer [1-5]
    mutate(X_age = case_when(tmp_age >= 16 & tmp_age < 35 ~ 1,
                             tmp_age >= 35 & tmp_age < 45 ~ 2,
                             tmp_age >= 45 & tmp_age < 55 ~ 3,
                             tmp_age >= 55 & tmp_age < 65 ~ 4,
                             tmp_age >= 65 & tmp_age <= 100 ~ 5,
                             tmp_age >= 1 & tmp_age <= 5 ~ as.double(tmp_age),
                             TRUE ~ 9999))

  if(max(df$X_age) == 9999) stop('Supplied age must be either between 16 and 100 or an age band (1 to 5)')

  if (summary == 0){ # Individual domains provided
    df[input] = sapply(df[input], as.character)   # Convert domains to character
    df = df %>% mutate(Domain = pmap_chr(df[input], str_c, collapse = "")) %>% # Collapse domain to single var & join 
      left_join(select(my_df, c(Domain, Output, X_age, X_male)), by = c("Domain", "X_age", "X_male"))

  } else if (summary == 1){ # Summary score provided
    df = df %>% rename("X_U5" = all_of(input)) %>% mutate(ID = 1:n()) # Change outcomes to match, add unique ID
    if (bwidth == 0){ # Exact score provided, lookup on this
      df = left_join(df, select(my_df, c(X_U5, Output, X_age, X_male)), by = c("X_U5", "X_age", "X_male")) %>%
        group_by(ID) %>% summarise(X_age = mean(X_age), X_male = mean(X_male), X_U5 = mean(X_U5), Output = mean(Output))
      #if (is.na(df$Output)) df$Output = "No exact matches found, try setting bandwidth > 0"

    } else { # Approximate score provided, weighted average over all observations (matched by age and sex).
      df = df %>% mutate(tmp = list(my_df),
                    data = pmap(list(X_age, X_male, tmp), function(x, y, z) filter(z, X_age == x & X_male == y)),
                    Epan = map2(X_U5, data, function(x, y) fun_Epan(target = x, values = y$X_U5, bwidth = bwidth)),
                    Output = map2_dbl(data, Epan, function(x, y) weighted.mean(x = x$Output, w = y))) %>%
        select(-c(tmp, data, Epan)) %>% unnest_legacy()
    }
  } else my_out = "Incorrect value for summary, must be 0 or 1"
  return(df) 
}

fun_map("age", "male", input = "util5L", EQdata2, summary = 1, bwidth =0.0001)  # From a 5L utility score

This produces:

  tmp_age X_male  X_U5 X_age ID    Output
1      50      1 0.715     3  1 0.6701727
2      30      0 0.435     1  2 0.3015699
3      70      1 0.950     5  3 0.9348593

The eq5dmap function can replicate this, but only to 3 decimal places:

EQdata2$mapped_3L <- unlist(lapply(1:nrow(EQdata2), function(data_point) {
  eq5dmap(
    scores = EQdata2$util5L[data_point],
    country = "UK",
    version = "5L",
    age = EQdata2$age[data_point],
    sex = ifelse(EQdata2$male[data_point] == 1,"male","female"),
    bwidth = 0.0001
  )
}))

to produce:

  age male util5L mapped_3L
1  50    1  0.715     0.670
2  30    0  0.435     0.302
3  70    1  0.950     0.935

This only covers the approximate example using 5L utility scores. However, I hope it illustrates how useful it would be to include a vignette to the eq5d package which demonstrates to people how to use the functions and exactly the methods/data being used too.

Again, thank you very much for this package - it is extremely useful!

fragla commented 1 year ago

Thanks, I'm glad you find the package useful!

Firstly, this link does not work - the DSU must have changed the location of that page.

I noticed the URL had changed shortly after the last release. It's been changed in the development version and will be updated on CRAN at the next release.

  • Flesh out the eq5d package documentation to more specifically state which methods are being applied to which datasets
  • Potentially include a short vignette from the supplementary matierals zip file provided here, incorporating these examples and potentially revising to use the eq5dmap function rather than programming an entirely new function within the example.

In the short term I can add to the README/docs that it's the EEPRU data that's used. I'm probably not going to be able to write a new vignette for a while. However, I'm happy for you to do this and submit a pull request (I'm always happy for others to contribute to the package). I'd use the eq5d wrapper function rather than eq5dmap directly which automatically handles multiple values. e.g.

dat <- data.frame(Age = c(50,30,70), Sex = c("m","f","m"), Utility = c(0.715,0.435,0.95))
dat
#>   Age Sex Utility
#> 1  50   m   0.715
#> 2  30   f   0.435
#> 3  70   m   0.950

eq5d(dat, country = "UK", type = "DSU", version = "5L", bwidth = 0.0001)
#> [1] 0.670 0.302 0.935

The eq5dmap function can replicate this, but only to 3 decimal places

This can be easily changed to allow the user to define the precision.

Thanks again for using the package - it's great that people are finding it useful and it's always good to get feedback/suggestions.

DKBurns commented 1 year ago

Thanks for responding so quickly.

Firstly, thanks for updating the link, I'm sure people will find that useful.

Second, Thanks for the instructions for doing the analysis in a more vectorised way, it is appreciated considering the many values that one would typically have to cycle through to perform mapping on e.g. a clinical trial analysis.

Below I attach a stand-alone script which downloads the code from the DSU website, unzips it into a folder of your choosing and runs through a comparison of that code with the eq5d package commands (with updates following on from your response above - thanks again!)

Please note that I think I spotted a bug with the eq5d() command - bwidth doesn't affect the results when using eq5d, but does (and does match the DSU website code) when using eq5dmap. Hopefully an easy fix. For now, there's code below which uses lapply to iterate through the observations to apply eq5dmap.

Hopefully useful,

Darren

#Load data
library(readxl)
library(tidyverse)
library(eq5d)

rm(list = ls())

# Source materials --------------------------------------------------------

# The source material is available here: https://www.sheffield.ac.uk/nice-dsu/methods-development/mapping-eq-5d-5l-3l
# Download link: https://www.sheffield.ac.uk/media/34041/download?attachment

# Automatically downloading into folder. Change the zip_dest parameter to change the destination.

# Set this to the folder location you would like to perform the replication within.
# Here I use a folder that is inside of an .Rproj I am working in.
zip_dest <- "./data/examples/mapping/"

temp <- tempfile()
download.file("https://www.sheffield.ac.uk/media/34041/download?attachment",temp)
data <- unz(temp, zip_dest)
unlink(temp)

# Running the function definitions ----------------------------------------

# There are 2 R scripts, one with functions to perform mapping and the other
# to do the below (before we perform replication with the functions inside
# the eq5d package)

source(file.path(zip_dest,"EQ5Dmap_March22.R"))

# Example datasets --------------------------------------------------------

EQ5Dmap1 <- read_excel(file.path(zip_dest,"eq5dmapUK - data1.xlsx"))
EQdata1  <- data.frame(EQ5Dmap1)#convert to dataframe

#run example on 5L exact utility score
EQ5Dmap2 <- read_excel(file.path(zip_dest,"eq5dmapUK - data2.xlsx"))
EQdata2  <- data.frame(EQ5Dmap2)#convert to dataframe

# 5L descriptive ----------------------------------------------------------

#run example on  5L descriptive 
fun_map("age", "male", input = c("Y5_1", "Y5_2", "Y5_3", "Y5_4", "Y5_5"), EQdata1, summary = 0)  

EQ5D_desc <- EQdata1
# To replicate this using the function in the eq5d package, we could cycle down individuals:
EQ5D_desc$util5L <- unlist(lapply(1:nrow(EQdata1), function(eq5d_response) {
  vals <- as.numeric(EQdata1[eq5d_response,paste0("Y5_",1:5)])
  names(vals) <- c("MO", "SC", "UA", "PD", "AD")
  eq5d(
    scores = vals,
    version = "5L",
    type = "DSU",
    country = "UK",
    age = EQdata1[eq5d_response,"age"],
    sex = ifelse(EQdata1[eq5d_response,"male"] == 1, "male","female")
  )
}))
EQ5D_desc

# As pointed out by the packate author in an issue I raised here: https://github.com/fragla/eq5d/issues/25
# This should also be possible using the eq5d command without needing to iterate:

EQ5D_desc <- EQ5D_desc |> 
  rename(
    MO = Y5_1,
    SC = Y5_2,
    UA = Y5_3,
    PD = Y5_4,
    AD = Y5_5
  ) |> 
  mutate(
    sex = ifelse(male == 0, "f", "m")
  )

# As below, one can calculate scores based on the individual responses in bulk like so.
# these match the values when doing this individually (one row at a time)
EQ5D_desc$util5L_2 <- eq5d(
  scores = data.frame(
    MO      = EQ5D_desc$MO,
    SC      = EQ5D_desc$SC,
    UA      = EQ5D_desc$UA,
    PD      = EQ5D_desc$PD,
    AD      = EQ5D_desc$AD,
    Age     = EQ5D_desc$age,
    Sex     = EQ5D_desc$sex
  ),
  version = "5L",
  type = "DSU",
  country = "UK"
)

EQ5D_desc

# Mapping approximate method -------------------------------

# Code from the DSU website:
fun_map("age", "male", input = "util5L", EQdata2, summary = 1, bwidth =0.0001)  # From a 5L utility score

# The below uses the eq5dmap function to replicate the results of the above:
EQdata2$mapped_3L <- unlist(lapply(1:nrow(EQdata2), function(data_point) {
  eq5dmap(
    scores = EQdata2$util5L[data_point],
    country = "UK",
    version = "5L",
    age = EQdata2$age[data_point],
    sex = ifelse(EQdata2$male[data_point] == 1,"male","female"),
    bwidth = 0.0001
  )
}))

EQdata2

# run example on line 6 5L approx utility score
fun_map("age", "male", input = "util5L", EQdata2, summary = 1, bwidth =0.1)  # From a summary 5L utility score

EQdata2$mapped_3L_bw01 <- unlist(lapply(1:nrow(EQdata2), function(data_point) {
  eq5dmap(
    scores = EQdata2$util5L[data_point],
    country = "UK",
    version = "5L",
    age = EQdata2$age[data_point],
    sex = ifelse(EQdata2$male[data_point] == 1,"male","female"),
    bwidth = 0.1
  )
}))
EQdata2

# As before, this can be done without having to iterate the function, and also
# with the eq5d() function instead of the mapping function. On advice from 
# the package author, this is how to accomplish this:

eq2_dat <- data.frame(
  Age     = EQdata2$age,
  Sex     = ifelse(EQdata2$male == 0, "f", "m"),
  Utility = EQdata2$util5L
)

# Possible bug here: These both produce the same result

eq2_dat$mapped_3L_DSU_0001 <- eq5d(
    scores  = eq2_dat,
    country = "UK",
    type    = "DSU",
    version = "5L",
    bwidth  = 0.0001
  )
eq2_dat$mapped_3L_DSU_1 <- eq5d(
    scores  = eq2_dat,
    country = "UK",
    type    = "DSU",
    version = "5L",
    bwidth  = 0.1
  )

# Comparison:
fun_map("age", "male", input = "util5L", EQdata2, summary = 1, bwidth =0.0001)
fun_map("age", "male", input = "util5L", EQdata2, summary = 1, bwidth =0.1)
EQdata2
eq2_dat

# Is bwidth being ignored in the eq5d command?

Console output:

> fun_map("age", "male", input = "util5L", EQdata2, summary = 1, bwidth =0.1)
  tmp_age X_male  X_U5 mapped_3L mapped_3L_bw01 X_age ID    Output
1      50      1 0.715     0.670          0.637     3  1 0.6365827
2      30      0 0.435     0.302          0.297     1  2 0.2968268
3      70      1 0.950     0.935          0.844     5  3 0.8440490
> fun_map("age", "male", input = "util5L", EQdata2, summary = 1, bwidth =0.0001)
  tmp_age X_male  X_U5 mapped_3L mapped_3L_bw01 X_age ID    Output
1      50      1 0.715     0.670          0.637     3  1 0.6701727
2      30      0 0.435     0.302          0.297     1  2 0.3015699
3      70      1 0.950     0.935          0.844     5  3 0.9348593
> fun_map("age", "male", input = "util5L", EQdata2, summary = 1, bwidth =0.1)
  tmp_age X_male  X_U5 mapped_3L mapped_3L_bw01 X_age ID    Output
1      50      1 0.715     0.670          0.637     3  1 0.6365827
2      30      0 0.435     0.302          0.297     1  2 0.2968268
3      70      1 0.950     0.935          0.844     5  3 0.8440490
> EQdata2
  age male util5L mapped_3L mapped_3L_bw01
1  50    1  0.715     0.670          0.637
2  30    0  0.435     0.302          0.297
3  70    1  0.950     0.935          0.844
> eq2_dat
  Age Sex Utility mapped_3L_DSU_0001 mapped_3L_DSU_1
1  50   m   0.715              0.670           0.670
2  30   f   0.435              0.302           0.302
3  70   m   0.950              0.935           0.935
fragla commented 1 year ago

Please note that I think I spotted a bug with the eq5d() command - bwidth doesn't affect the results when using eq5d, but does (and does match the DSU website code) when using eq5dmap. Hopefully an easy fix. For now, there's code below which uses lapply to iterate through the observations to apply eq5dmap.

Thanks for spotting the issue with the bwidth argument in the eq5d function. I've tracked it down. This was due to the way the eq5d function had been set up. DSU docs advise approximate scores for bwidth are < 0.8: 0.2, 0.8-0.951: 0.1, 0.951-1: small, but large enough to include 1. The eq5d function code was setup so that when a data.frame was submitted a "bwidth" column could be included which could contain different bwidth values for each score. When a single number was passed as the argument it was ignored as it couldn't find a column with that name in the data.frame. It's now been changed (and additional tests added) so that either a single column name or single number can be passed as an argument. The updated function is now available in the development version.

DKBurns commented 1 year ago

I see - so if I add a column to the data.frame called bwidth, it will control the bandwidth for each individual observation.

For the benefit of the public - here is 5L to 3L mapping working within the current release of the eq5d pacakge, in line with the NICE DSU guidance to be used for all NICE submissions requiring such mapping 😃

Thanks again @fragla - this has been very useful, and hopefully this conversation can help others to ensure that they are performing EQ5D mapping in line with NICE methods guidance!

@DKBurns

# Possible bug here: These both produce the same result
# 
# UPDATE: bug solved via adding a column for bandwidth to the data.frame

eq2_dat$bwidth <- rep(0.0001,nrow(eq2_dat))
eq2_dat$mapped_3L_DSU_0001 <- eq5d(
    scores  = eq2_dat,
    country = "UK",
    type    = "DSU",
    version = "5L"
  )

eq2_dat$bwidth <- rep(0.1,nrow(eq2_dat))
eq2_dat$mapped_3L_DSU_1 <- eq5d(
    scores  = eq2_dat,
    country = "UK",
    type    = "DSU",
    version = "5L"
  )

# Comparison:
fun_map("age", "male", input = "util5L", EQdata2, summary = 1, bwidth =0.0001)
fun_map("age", "male", input = "util5L", EQdata2, summary = 1, bwidth =0.1)
EQdata2
eq2_dat
> # Comparison:
> fun_map("age", "male", input = "util5L", EQdata2, summary = 1, bwidth =0.0001)
  tmp_age X_male  X_U5 mapped_3L mapped_3L_bw01 X_age ID    Output
1      50      1 0.715     0.670          0.637     3  1 0.6701727
2      30      0 0.435     0.302          0.297     1  2 0.3015699
3      70      1 0.950     0.935          0.844     5  3 0.9348593
> fun_map("age", "male", input = "util5L", EQdata2, summary = 1, bwidth =0.1)
  tmp_age X_male  X_U5 mapped_3L mapped_3L_bw01 X_age ID    Output
1      50      1 0.715     0.670          0.637     3  1 0.6365827
2      30      0 0.435     0.302          0.297     1  2 0.2968268
3      70      1 0.950     0.935          0.844     5  3 0.8440490
> EQdata2
  age male util5L mapped_3L mapped_3L_bw01
1  50    1  0.715     0.670          0.637
2  30    0  0.435     0.302          0.297
3  70    1  0.950     0.935          0.844
> eq2_dat
  Age Sex Utility bwidth mapped_3L_DSU_0001 mapped_3L_DSU_1
1  50   m   0.715    0.1              0.670           0.637
2  30   f   0.435    0.1              0.302           0.297
3  70   m   0.950    0.1              0.935           0.844
fragla commented 1 year ago

I see - so if I add a column to the data.frame called bwidth, it will control the bandwidth for each individual observation.

Yes, that's correct (in the current release). In the development version you can now pass a single bwidth value to be used for all observations or have a column with a bandwidth for each individual observation. I'll update the docs before the next release and add a specific "NICE DSU" vignette to clarify things. Thanks @DKBurns for submitting this issue, it's been helpful.