Issue when creating user criterion

Scoobys-keeper commented 7 months ago

Hi there!

I am currently trying to use Matilda/Hector to score my 100 ensemble runs against observed global temperatures. When I try and define a new criterion for observational data, I am met with scored runs all equal to 0.01. I am trying to set my own criterion because when I attempt to use the inbuilt criterion_gmst_obs()

The error below is returned. Error in score_runs(results_100, criterion_gmst_obs(), score_ramp, w1 = 2, : criterion year and variable combination not represented in data

Any ideas why thecriterion_gmst_obs() would return such an error? My ideal fix is solving this so I dont have to fuss with creating my own criterion

Also, am I correct to think this criterion of global_tas included in the vignette (shown below) uses a generated sequence of values as opposed to actual observational data? Either way, using the snippet below I still receive scored values of 0.01 for all 100 observations. my_criterion <- new_criterion(GLOBAL_TAS(), years = 1951:2000, obs_values = seq(0.4, 1.0, length.out = 50)

Thank you for your help!! Code pasted below.

ini_file <- system.file("input/hector_ssp245.ini", package = "hector")
core <- newcore(ini_file, name="SSP_245")
core

set.seed(2455)
param_sets <- generate_params (core = core, draws = 100)
print(param_sets)

results_100 <- iterate_model(core = core, params = param_sets, save_years = 1850:2100)
print(results_100)

#temp data from the HADcrut analysis 
temp_data <- read.csv("HADCRUT_1950-2023.csv", header = TRUE)

#Tried both of these options to no avail

#1- user_temp_criterion1 <- new_criterion(var = "global_tas", years = temp_data$Year,
                                     obs_values = temp_data$global_tas)

#2-my_criterion <- new_criterion(GLOBAL_TAS(),
                              years = 1951:2000,
                              obs_values = seq(0.4, 1.0, length.out = 50)) 

scored_runs_my <- score_runs(results_100, criterion = my_criterion, score_ramp, w1= 2, w2 = 20)

score_results <- merge(results_100, scored_runs_my, by = "run_number")

ggplot(data = score_results) +
  geom_line(aes(x = year, y = value, group = run_number, color = weights)) +
  scale_color_continuous() +
  facet_wrap(~variable, scales = "free_y")

Thank you for your time and apologies if I missed anything in the explanation! Willy

jk-brown commented 7 months ago

Hi Willy,

Hoping I can address this below:

The error you are getting when you try to use the inbuilt criterion criterion_gmst_obs() occurs when the result from iterate_model does not contain the year range and/or the variable you want to use for scoring. In this case the results probably are not returning the variable gmst - which is strange as I thought this was part of the default Hector result. At any rate, the easiest solution here would be to tell matilda to fetch gmst by specifying save_vars = GMST() or save_vars = "gmst"

ini_file <- system.file("input/hector_ssp245.ini", package = "hector")
core <- newcore(ini_file, name="SSP_245")
core

set.seed(2455)
param_sets <- generate_params (core = core, draws = 10)
print(param_sets)

results_100 <- iterate_model(core = core, params = param_sets, save_vars = GMST(), save_years = 1850:2100)
print(results_100)

# continue with weighting steps...

"_Also, am I correct to think this criterion of globaltas included in the vignette (shown below) uses a generated sequence of values as opposed to actual observational data? Either way, using the snippet below I still receive scored values of 0.01 for all 100 observations."

Thats correct, the vignette is simply showing how one could theoretically build their own criterion using any vector they provide. The sequence of values in the vignette is arbitrary and could very well produce a nonsensical result, especially when using score_ramp with bounds of w1 = 2 and w2 = 20 since those bounds are not on a scale that will detect differences in temperature between the criterion and modeled data. The result of 0.01 for all runs occurs because all the temperature differences are probably less than the lower bound w1 =2.

Give this a shot to test your code after re-running to fetch GMST: scored_runs_my <- score_runs(results_100, criterion = criterion_gmst_obs(), score_ramp, w1= 0.0, w2 = 1.0) in this case we are telling matilda that "perfect" scores should only be given to models that exactly match the criterion, and any models that have a temp difference more than 1C should be scored 0 because they are unreliable results.

Here is a link to preprint of the matilda paper, the accepted version is currently in press at PLOS Climate: https://eartharxiv.org/repository/view/5938/

Scoobys-keeper commented 7 months ago

@jk-brown Thank you for the direction here- I'll work through this now! The paper has been incredible and has helped get me this far along before asking for help!

Scoobys-keeper commented 7 months ago

Sorry to bother again, but wondering if you'd have any thoughts on this error

`

metric_lt <- new_metric(var = "global_tas", years = 2081:2100, op = mean)
print(metric_lt)

values_metric_lt <- metric_calc(results_10, metric_lt)

Error in FUN(X[[i]], ...) : variable not present in x, metric_value == NA

can include any other code as needed - thank you so much!!

Willy

jk-brown commented 7 months ago

My first inclination is that the long term metric you are defining is using var = global_tas but the variable that was fetched in result_10 is gmst. This is just an assumption based on the previous comments in the issue.

Thanks for pointing this out. I agree that this error is not well coded. But it means that in this line of code: values_metric_lt <- metric_calc(results_10, metric_lt) the metric values (in this case the long term mean global_tas) values_metric_lt cannot be calculated because the variable(s) included in the metric definition metric_lt is not available or cannot be found in the result data frame.

Based on the previous comments in the issue I would guess that is the problem. So I would guess running for gmst would work because then expected variables align:

metric_lt <- new_metric(var = "gmst", years = 2081:2100, op = mean)
print(metric_lt)

values_metric_lt <- metric_calc(results_10, metric_lt)

This should give you the 2081-2100 mean global average surface temperature anomaly for each run.

hope this helps

jk-brown commented 6 months ago

Hi Willy @Scoobys-keeper , wanted to see if issues you raised here were resolved for now?

Scoobys-keeper commented 6 months ago

Hi @jk-brown Yes all sorted here! Thanks so much :)

JGCRI / matilda

Issue when creating user criterion #80