Closed saeedashraf closed 1 month ago
You are trying to fit a GLM with only 5 observations. The model is fitted, but likely because of collinearity and limited power one of the coefficients of the cross-basis is set as NA (it is not estimable). Therefore, you cannot make a prediction. If you look at the help page of crosspred(), as advised, you can see this issues described in the section Warnings.
Best -Antonio Gasparrini
Even if we extend the time series to 10 years I receive the same error
As advised in the previous message, you need to check if you have NA's in either the coef or vcov. Otherwise, please create a reproducible example and send data and code.
thanks...
merged_data_climate_mortality_ZH_v2only_Internal_factors_ICD_10(1).csv
# Load necessary libraries
library(dlnm)
library(splines)
# Load your CSV file
data <- read.csv("merged_data_climate_mortality_ZH_v2only_Internal_factors_ICD_10(1).csv")
# Define your columns
temp_col <- data$KLO_Obs_Tmax
death_col <- data$death_count
max_lag <- 3 # Define maximum lag
# Check for NA values
cat("NA values in temp_col:", sum(is.na(temp_col)), "\n")
cat("NA values in death_col:", sum(is.na(death_col)), "\n")
# Create a crossbasis for the temperature variable with linear lag
cb_temp <- crossbasis(temp_col, lag = max_lag, argvar = list(fun = "lin"), arglag = list(fun = "lin"))
print("Crossbasis created successfully.")
# Fit the GLM model with quasi-Poisson family
model <- glm(death_col ~ cb_temp, family = quasipoisson())
print("Model fitted successfully.")
# Summary of the fitted model
print("Summary of the fitted model:")
summary(model)
# Set up temperature range for prediction
temp_range <- seq(min(temp_col, na.rm = TRUE), max(temp_col, na.rm = TRUE), length.out = 100)
# Predict using crosspred
pred_temp <- crosspred(cb_temp, model, at = temp_range, cumul = TRUE)
# Calculate the Minimum Mortality Temperature (MMT)
# MMT is defined as the temperature associated with the maximum death count
mmt_index <- which.max(death_col) # Find index of maximum death count
mmt_value <- temp_col[mmt_index] # Get corresponding temperature
cat("Minimum Mortality Temperature (MMT):", mmt_value, "\n")
# Plot Relative Risk (RR) against Tmax
plot(pred_temp, "slices", var = median(temp_range), col = 3,
ylab = "Relative Risk (RR)", ci.arg = list(density = 15, lwd = 2),
main = "Relative Risk of Mortality vs. Tmax")
# Add a vertical line for MMT
abline(v = mmt_value, col = "red", lwd = 2, lty = 2)
legend("topright", legend = paste("MMT =", round(mmt_value, 2)), col = "red", lty = 2, bty = "n")
I ran the code and the call to crosspred() (line 33) runs without errors. An error is thrown later when calling plot() (line 42), because you select var = median(temp_range), which is not among the values chosen for prediction in crosspred(). You can check them using pred_temp$predvar. This is likely because the median is between two actual values, and the mid-value is taken. Try using var = temp_range[50] and it works.
Next time, please check the error messages more carefully.
Best -Antonio Gasparrini
Thank you so much, Antonio,
I agree, and your hints were super useful. I tried to reach out by email, but I understand how busy you are. So, I did some predictions for future scenarios based on the steps we performed: Calculation of RR vs. temperature for historical data. The results show a reasonable pattern. (Thanks for your help!). Then, we used the model for future predictions, considering temperature (knowing that we do not have predictions for mortality). The models produced the results; however, I see two issues: the RR scale is up even to 2000 and 300, given that temperature does not increase that much. I have attached our code, inputs, and results. Would you please give us another hint?
Best regards, Saeid Vaghefi
# Load necessary libraries
library(dlnm)
library(splines)
library(ggplot2)
calculate_rr_future <- function(df, df_future, his_temp_col_name, fu_temp_cols, rcp_label, death_col_name, ref_temp, quantile_threshold=0.25, df_spline=3, window_size=7) {
rr_list <- list() # Initialize empty list to store RR results
max_lag <- window_size # Define maximum lag
temp_col <- df[[his_temp_col_name]] # Historical temperature column
death_col <- df[[death_col_name]] # Death column
# Create a crossbasis for the historical temperature with splines for lag effects
cb_temp <- crossbasis(temp_col, lag = max_lag,
argvar = list(fun = "ns", df = df_spline),
arglag = list(fun = "ns", df = df_spline),
)
# Fit the GLM model with quasi-Poisson family
model <- glm(death_col ~ cb_temp, family = quasipoisson())
# Set up temperature range for prediction
temp_range <- seq(min(temp_col, na.rm = TRUE), max(temp_col, na.rm = TRUE), length.out = 100) # Adjust length.out as needed
# Predict relative risk (RR) using crosspred
pred <- crosspred(cb_temp, model, at = temp_range, cen = reference_temperature, cumul = TRUE)
# Extract predicted RR values and temperature
rr_fit <- pred$allRRfit
# Calculate Minimum Mortality Temperature (MMT)
mmt <- temp_range[which.min(abs(rr_fit - 1))]
# Plot Relative Risk of Mortality vs. Temperature
plot(temp_range, rr_fit, type = "l", col = "blue", lwd = 2,
xlab = "Tmax", ylab = "Relative Risk (RR)",
main = paste("Relative Risk of Mortality vs. Tmax", "Historical Data"))
abline(v = mmt, col = "green", lty = 2, lwd = 2) # MMT line
abline(h = 1, col = "red", lty = 2) # Reference line at RR = 1
# Add text annotation for MMT
text(mmt, max(rr_fit), labels = paste("MMT =", round(mmt, 2), "°C"), col = "green", pos = 4)
# Loop over each future temperature column
for (fu_temp_col_name in fu_temp_cols) {
fu_temp_col <- df_future[[fu_temp_col_name]] # Extract future temperature column
fu_date_col <- df_future[["date"]] # Extract future date column
# Set up temperature range for prediction
temp_range_future <- seq(min(fu_temp_col, na.rm = TRUE), max(fu_temp_col, na.rm = TRUE), length.out = length(fu_temp_col))
# Predict future relative risk (RR) using crosspred
pred_future <- crosspred(cb_temp, model, at = temp_range_future, cen = ref_temp, cumul = TRUE)
# Extract predicted RR values
rr_fit2 <- pred_future$allRRfit
# Create a DataFrame for this RCP column's RR values
rr_df <- data.frame(
Temperature = temp_range_future,
RR = rr_fit2,
date = fu_date_col,
RCP = rcp_label
)
rr_list[[fu_temp_col_name]] <- rr_df # Append to list
}
# Combine all RR DataFrames into one
do.call(rbind, rr_list)
}
# Load your CSV files
merged_data <- read.csv("C:/not share/NCCS/NCCS-RESULT/test/UR_merged_data.csv")
future_climate_data <- read.csv("C:/not share/NCCS/NCCS-RESULT/test/UR_future_climate_data.csv")
window_size <- 7 # Number of days to consider in the lag
df_spline <- 3
reference_temperature <- mean(merged_data$Tmax, na.rm = TRUE) # Adjusted to calculate the mean of 'Tmax' in R
# Filter columns for each RCP scenario
rcp26_columns <- grep("RCP26", names(future_climate_data), value = TRUE)
rcp45_columns <- grep("RCP45", names(future_climate_data), value = TRUE)
rcp85_columns <- grep("RCP85", names(future_climate_data), value = TRUE)
# Calculate RR future for each RCP scenario
rr_future_26 <- calculate_rr_future(merged_data, future_climate_data, "Tmax", rcp26_columns, "RCP26", 'death_count', reference_temperature, quantile_threshold = 0.1, df_spline = df_spline,window_size=window_size)
rr_future_45 <- calculate_rr_future(merged_data, future_climate_data, "Tmax", rcp45_columns, "RCP45", 'death_count', reference_temperature, quantile_threshold = 0.1, df_spline = df_spline,window_size=window_size)
rr_future_85 <- calculate_rr_future(merged_data, future_climate_data, "Tmax", rcp85_columns, "RCP85", 'death_count', reference_temperature, quantile_threshold = 0.1, df_spline = df_spline,window_size=window_size)
rr_future_all <- do.call(rbind, list(rr_future_26, rr_future_45, rr_future_85))
# Load the lubridate library for date manipulation
library(lubridate)
# Ensure 'date' column is in Date format
rr_future_all$date <- as.Date(rr_future_all$date)
rr_future_26$date <- as.Date(rr_future_26$date)
rr_future_45$date <- as.Date(rr_future_45$date)
rr_future_85$date <- as.Date(rr_future_85$date)
# Extract year and assign decade for rr_future_all
rr_future_all$year <- year(rr_future_all$date)
rr_future_all$decade <- (rr_future_all$year %/% 10) * 10
# Extract year and assign decade for each RCP dataframe
rr_future_26$year <- year(rr_future_26$date)
rr_future_26$decade <- (rr_future_26$year %/% 10) * 10
rr_future_45$year <- year(rr_future_45$date)
rr_future_45$decade <- (rr_future_45$year %/% 10) * 10
rr_future_85$year <- year(rr_future_85$date)
rr_future_85$decade <- (rr_future_85$year %/% 10) * 10
# Define color palette
palette <- c("RCP26" = "green", "RCP45" = "orange", "RCP85" = "red")
# Create boxplot for RR by Decade and RCP Scenario
# Split the data by RCP and then plot each RCP's data with `boxplot()`
par(mfrow = c(1, 1)) # Reset to single plot layout
# Subset data for each RCP
rcp26_data <- rr_future_all[rr_future_all$RCP == "RCP26", ]
rcp45_data <- rr_future_all[rr_future_all$RCP == "RCP45", ]
rcp85_data <- rr_future_all[rr_future_all$RCP == "RCP85", ]
# Define color palette
palette <- c("RCP26" = "green", "RCP45" = "orange", "RCP85" = "red")
# Create boxplot for RR by Decade and RCP Scenario
# Split the data by RCP and then plot each RCP's data with `boxplot()`
par(mfrow = c(1, 1)) # Reset to single plot layout
# Subset data for each RCP
rcp26_data <- rr_future_all[rr_future_all$RCP == "RCP26", ]
rcp45_data <- rr_future_all[rr_future_all$RCP == "RCP45", ]
rcp85_data <- rr_future_all[rr_future_all$RCP == "RCP85", ]
# Boxplot with base R
boxplot(RR ~ decade, data = rcp26_data, col = palette["RCP26"],
ylim = range(rr_future_all$RR, na.rm = TRUE),
xlab = "Decade", ylab = "Relative Risk (RR)",
main = paste('test', "Relative Risk (RR) by Decade and RCP Scenario"))
boxplot(RR ~ decade, data = rcp45_data, col = palette["RCP45"], add = TRUE)
boxplot(RR ~ decade, data = rcp85_data, col = palette["RCP85"], add = TRUE)
legend("topright", legend = names(palette), fill = palette, title = "RCP Scenario")
# Calculate yearly average RR for line plot
rr_yearly_avg <- aggregate(RR ~ year + RCP, data = rr_future_all, mean, na.rm = TRUE)
# Create line plot for RR by Year and RCP Scenario
par(mfrow = c(1, 1)) # Reset to single plot layout
plot(NULL, xlim = range(rr_yearly_avg$year), ylim = range(rr_yearly_avg$RR, na.rm = TRUE),
xlab = "Year", ylab = "Average Relative Risk (RR)",
main = paste('test', "Trend of Relative Risk (RR) by Year and RCP Scenario"))
# Add lines for each RCP scenario
for (rcp in names(palette)) {
rcp_data <- rr_yearly_avg[rr_yearly_avg$RCP == rcp, ]
lines(rcp_data$year, rcp_data$RR, type = "b", col = palette[rcp], pch = 16)
}
# Add legend
legend("topright", legend = names(palette), col = palette, pch = 16, title = "RCP Scenario")
Hi saeedashraf,
thanks for your interest and your question. However, I prefer keeping this space for discussions strictly focused on the dlnm package, and not general analyses.
best -Antonio Gasparrini
Hello team,
can you help me to figure out the following error in the sample code afterward?
error:
Sample Code: