gasparrini / dlnm

R package dlnm
75 stars 13 forks source link

running a sample code #15

Closed saeedashraf closed 1 month ago

saeedashraf commented 1 month ago

Hello team,

can you help me to figure out the following error in the sample code afterward?

error:

> pred_temp <- crosspred(cb_temp, model, at = temp_range, cumul = TRUE)
Error in crosspred(cb_temp, model, at = temp_range, cumul = TRUE) : 
  coef/vcov not consistent with basis matrix. See help(crosspred)

Sample Code:


library(dlnm)
library(splines)

data <- data.frame(
  date = as.Date(c('1995-05-08', '1995-05-09', '1995-05-10', '1995-05-11', '1995-05-12')),
  death_count = c(14.0, 18.0, 16.0, 8.0, 9.0),
  tmax = c(23.1, 23.5, 22.7, 21.6, 19.2)
)

temp_col <- data$tmax
death_col <- data$death_count
max_lag <- 3

cat("NA values in temp_col:", sum(is.na(temp_col)), "\n")
cat("NA values in death_col:", sum(is.na(death_col)), "\n")

cb_temp <- crossbasis(temp_col, lag = max_lag, argvar = list(fun = "lin"), arglag = list(fun = "lin"))
print("Crossbasis created successfully.")

model <- glm(death_col ~ cb_temp, family = quasipoisson())
print("Model fitted successfully.")

print("Summary of the fitted model:")
summary(model)

temp_range <- seq(min(temp_col, na.rm = TRUE), max(temp_col, na.rm = TRUE), length.out = 100)
pred_temp <- crosspred(cb_temp, model, at = temp_range, cumul = TRUE)

mmt_index <- which.max(death_col)
mmt_value <- temp_col[mmt_index]
cat("Minimum Mortality Temperature (MMT):", mmt_value, "\n")

plot(pred_temp, "slices", var = median(temp_range), col = 3,
     ylab = "Relative Risk (RR)", ci.arg = list(density = 15, lwd = 2),
     main = "Relative Risk of Mortality vs. Tmax")

abline(v = mmt_value, col = "red", lwd = 2, lty = 2)
legend("topright", legend = paste("MMT =", round(mmt_value, 2)), col = "red", lty = 2, bty = "n")
gasparrini commented 1 month ago

You are trying to fit a GLM with only 5 observations. The model is fitted, but likely because of collinearity and limited power one of the coefficients of the cross-basis is set as NA (it is not estimable). Therefore, you cannot make a prediction. If you look at the help page of crosspred(), as advised, you can see this issues described in the section Warnings.

Best -Antonio Gasparrini

saeedashraf commented 2 weeks ago

Even if we extend the time series to 10 years I receive the same error

gasparrini commented 2 weeks ago

As advised in the previous message, you need to check if you have NA's in either the coef or vcov. Otherwise, please create a reproducible example and send data and code.

saeedashraf commented 2 weeks ago

thanks...

merged_data_climate_mortality_ZH_v2only_Internal_factors_ICD_10(1).csv

# Load necessary libraries
library(dlnm)
library(splines)

# Load your CSV file
data <- read.csv("merged_data_climate_mortality_ZH_v2only_Internal_factors_ICD_10(1).csv")

# Define your columns
temp_col <- data$KLO_Obs_Tmax
death_col <- data$death_count
max_lag <- 3  # Define maximum lag

# Check for NA values
cat("NA values in temp_col:", sum(is.na(temp_col)), "\n")
cat("NA values in death_col:", sum(is.na(death_col)), "\n")

# Create a crossbasis for the temperature variable with linear lag
cb_temp <- crossbasis(temp_col, lag = max_lag, argvar = list(fun = "lin"), arglag = list(fun = "lin"))
print("Crossbasis created successfully.")

# Fit the GLM model with quasi-Poisson family
model <- glm(death_col ~ cb_temp, family = quasipoisson())
print("Model fitted successfully.")

# Summary of the fitted model
print("Summary of the fitted model:")
summary(model)

# Set up temperature range for prediction
temp_range <- seq(min(temp_col, na.rm = TRUE), max(temp_col, na.rm = TRUE), length.out = 100)

# Predict using crosspred
pred_temp <- crosspred(cb_temp, model, at = temp_range, cumul = TRUE)

# Calculate the Minimum Mortality Temperature (MMT)
# MMT is defined as the temperature associated with the maximum death count
mmt_index <- which.max(death_col)  # Find index of maximum death count
mmt_value <- temp_col[mmt_index]    # Get corresponding temperature
cat("Minimum Mortality Temperature (MMT):", mmt_value, "\n")

# Plot Relative Risk (RR) against Tmax
plot(pred_temp, "slices", var = median(temp_range), col = 3,
     ylab = "Relative Risk (RR)", ci.arg = list(density = 15, lwd = 2),
     main = "Relative Risk of Mortality vs. Tmax")

# Add a vertical line for MMT
abline(v = mmt_value, col = "red", lwd = 2, lty = 2)
legend("topright", legend = paste("MMT =", round(mmt_value, 2)), col = "red", lty = 2, bty = "n")
gasparrini commented 2 weeks ago

I ran the code and the call to crosspred() (line 33) runs without errors. An error is thrown later when calling plot() (line 42), because you select var = median(temp_range), which is not among the values chosen for prediction in crosspred(). You can check them using pred_temp$predvar. This is likely because the median is between two actual values, and the mid-value is taken. Try using var = temp_range[50] and it works.

Next time, please check the error messages more carefully.

Best -Antonio Gasparrini

saeedashraf commented 5 days ago

Thank you so much, Antonio,

I agree, and your hints were super useful. I tried to reach out by email, but I understand how busy you are. So, I did some predictions for future scenarios based on the steps we performed: Calculation of RR vs. temperature for historical data. The results show a reasonable pattern. (Thanks for your help!). Then, we used the model for future predictions, considering temperature (knowing that we do not have predictions for mortality). The models produced the results; however, I see two issues: the RR scale is up even to 2000 and 300, given that temperature does not increase that much. I have attached our code, inputs, and results. Would you please give us another hint?

Best regards, Saeid Vaghefi

# Load necessary libraries
library(dlnm)
library(splines)
library(ggplot2)

calculate_rr_future <- function(df, df_future, his_temp_col_name, fu_temp_cols, rcp_label, death_col_name, ref_temp, quantile_threshold=0.25, df_spline=3, window_size=7) {
  rr_list <- list()  # Initialize empty list to store RR results
  max_lag <- window_size  # Define maximum lag
  temp_col <- df[[his_temp_col_name]]  # Historical temperature column
  death_col <- df[[death_col_name]]  # Death column

  # Create a crossbasis for the historical temperature with splines for lag effects
  cb_temp <- crossbasis(temp_col, lag = max_lag,
                        argvar = list(fun = "ns", df = df_spline),
                        arglag = list(fun = "ns", df = df_spline),
                        )

  # Fit the GLM model with quasi-Poisson family
  model <- glm(death_col ~ cb_temp, family = quasipoisson())

  # Set up temperature range for prediction
  temp_range <- seq(min(temp_col, na.rm = TRUE), max(temp_col, na.rm = TRUE), length.out = 100)  # Adjust length.out as needed

  # Predict relative risk (RR) using crosspred
  pred <- crosspred(cb_temp, model, at = temp_range, cen = reference_temperature, cumul = TRUE)

  # Extract predicted RR values and temperature
  rr_fit <- pred$allRRfit

  # Calculate Minimum Mortality Temperature (MMT)
  mmt <- temp_range[which.min(abs(rr_fit - 1))]
  # Plot Relative Risk of Mortality vs. Temperature
  plot(temp_range, rr_fit, type = "l", col = "blue", lwd = 2,
       xlab = "Tmax", ylab = "Relative Risk (RR)",
       main = paste("Relative Risk of Mortality vs. Tmax", "Historical Data"))
  abline(v = mmt, col = "green", lty = 2, lwd = 2)  # MMT line
  abline(h = 1, col = "red", lty = 2)  # Reference line at RR = 1

  # Add text annotation for MMT
  text(mmt, max(rr_fit), labels = paste("MMT =", round(mmt, 2), "°C"), col = "green", pos = 4)

  # Loop over each future temperature column
  for (fu_temp_col_name in fu_temp_cols) {
    fu_temp_col <- df_future[[fu_temp_col_name]]  # Extract future temperature column
    fu_date_col <- df_future[["date"]]  # Extract future date column

    # Set up temperature range for prediction
    temp_range_future <- seq(min(fu_temp_col, na.rm = TRUE), max(fu_temp_col, na.rm = TRUE), length.out = length(fu_temp_col))

    # Predict future relative risk (RR) using crosspred
    pred_future <- crosspred(cb_temp, model, at = temp_range_future, cen = ref_temp, cumul = TRUE)

    # Extract predicted RR values
    rr_fit2 <- pred_future$allRRfit

    # Create a DataFrame for this RCP column's RR values
    rr_df <- data.frame(
      Temperature = temp_range_future,
      RR = rr_fit2,
      date = fu_date_col,
      RCP = rcp_label
    )
    rr_list[[fu_temp_col_name]] <- rr_df  # Append to list
  }

  # Combine all RR DataFrames into one
  do.call(rbind, rr_list)
}

# Load your CSV files
merged_data <- read.csv("C:/not share/NCCS/NCCS-RESULT/test/UR_merged_data.csv")
future_climate_data <- read.csv("C:/not share/NCCS/NCCS-RESULT/test/UR_future_climate_data.csv")

window_size <- 7  # Number of days to consider in the lag
df_spline <- 3
reference_temperature <- mean(merged_data$Tmax, na.rm = TRUE)  # Adjusted to calculate the mean of 'Tmax' in R

# Filter columns for each RCP scenario
rcp26_columns <- grep("RCP26", names(future_climate_data), value = TRUE)
rcp45_columns <- grep("RCP45", names(future_climate_data), value = TRUE)
rcp85_columns <- grep("RCP85", names(future_climate_data), value = TRUE)

# Calculate RR future for each RCP scenario
rr_future_26 <- calculate_rr_future(merged_data, future_climate_data, "Tmax", rcp26_columns, "RCP26", 'death_count', reference_temperature, quantile_threshold = 0.1, df_spline = df_spline,window_size=window_size)
rr_future_45 <- calculate_rr_future(merged_data, future_climate_data, "Tmax", rcp45_columns, "RCP45", 'death_count', reference_temperature, quantile_threshold = 0.1, df_spline = df_spline,window_size=window_size)
rr_future_85 <- calculate_rr_future(merged_data, future_climate_data, "Tmax", rcp85_columns, "RCP85", 'death_count', reference_temperature, quantile_threshold = 0.1, df_spline = df_spline,window_size=window_size)

rr_future_all <- do.call(rbind, list(rr_future_26, rr_future_45, rr_future_85))

# Load the lubridate library for date manipulation
library(lubridate)

# Ensure 'date' column is in Date format
rr_future_all$date <- as.Date(rr_future_all$date)
rr_future_26$date <- as.Date(rr_future_26$date)
rr_future_45$date <- as.Date(rr_future_45$date)
rr_future_85$date <- as.Date(rr_future_85$date)

# Extract year and assign decade for rr_future_all
rr_future_all$year <- year(rr_future_all$date)
rr_future_all$decade <- (rr_future_all$year %/% 10) * 10

# Extract year and assign decade for each RCP dataframe
rr_future_26$year <- year(rr_future_26$date)
rr_future_26$decade <- (rr_future_26$year %/% 10) * 10

rr_future_45$year <- year(rr_future_45$date)
rr_future_45$decade <- (rr_future_45$year %/% 10) * 10

rr_future_85$year <- year(rr_future_85$date)
rr_future_85$decade <- (rr_future_85$year %/% 10) * 10

# Define color palette
palette <- c("RCP26" = "green", "RCP45" = "orange", "RCP85" = "red")

# Create boxplot for RR by Decade and RCP Scenario
# Split the data by RCP and then plot each RCP's data with `boxplot()`
par(mfrow = c(1, 1))  # Reset to single plot layout

# Subset data for each RCP
rcp26_data <- rr_future_all[rr_future_all$RCP == "RCP26", ]
rcp45_data <- rr_future_all[rr_future_all$RCP == "RCP45", ]
rcp85_data <- rr_future_all[rr_future_all$RCP == "RCP85", ]

# Define color palette
palette <- c("RCP26" = "green", "RCP45" = "orange", "RCP85" = "red")

# Create boxplot for RR by Decade and RCP Scenario
# Split the data by RCP and then plot each RCP's data with `boxplot()`
par(mfrow = c(1, 1))  # Reset to single plot layout

# Subset data for each RCP
rcp26_data <- rr_future_all[rr_future_all$RCP == "RCP26", ]
rcp45_data <- rr_future_all[rr_future_all$RCP == "RCP45", ]
rcp85_data <- rr_future_all[rr_future_all$RCP == "RCP85", ]

# Boxplot with base R
boxplot(RR ~ decade, data = rcp26_data, col = palette["RCP26"], 
        ylim = range(rr_future_all$RR, na.rm = TRUE), 
        xlab = "Decade", ylab = "Relative Risk (RR)",
        main = paste('test', "Relative Risk (RR) by Decade and RCP Scenario"))
boxplot(RR ~ decade, data = rcp45_data, col = palette["RCP45"], add = TRUE)
boxplot(RR ~ decade, data = rcp85_data, col = palette["RCP85"], add = TRUE)
legend("topright", legend = names(palette), fill = palette, title = "RCP Scenario")

# Calculate yearly average RR for line plot
rr_yearly_avg <- aggregate(RR ~ year + RCP, data = rr_future_all, mean, na.rm = TRUE)

# Create line plot for RR by Year and RCP Scenario
par(mfrow = c(1, 1))  # Reset to single plot layout
plot(NULL, xlim = range(rr_yearly_avg$year), ylim = range(rr_yearly_avg$RR, na.rm = TRUE),
     xlab = "Year", ylab = "Average Relative Risk (RR)", 
     main = paste('test', "Trend of Relative Risk (RR) by Year and RCP Scenario"))

# Add lines for each RCP scenario
for (rcp in names(palette)) {
  rcp_data <- rr_yearly_avg[rr_yearly_avg$RCP == rcp, ]
  lines(rcp_data$year, rcp_data$RR, type = "b", col = palette[rcp], pch = 16)
}

# Add legend
legend("topright", legend = names(palette), col = palette, pch = 16, title = "RCP Scenario")

future_decade_prediction_r future_yearly_prediction historical_rr UR_future_climate_data.csv UR_merged_data.csv

gasparrini commented 5 days ago

Hi saeedashraf,

thanks for your interest and your question. However, I prefer keeping this space for discussions strictly focused on the dlnm package, and not general analyses.

best -Antonio Gasparrini