egouldo / ManyEcoEvo

Software for analysing Many-Analysts' style data and generating the ManyEcoEvo project data
https://egouldo.github.io/ManyEcoEvo/
GNU General Public License v3.0
2 stars 0 forks source link

ensure extreme SE values excluded prior to meta-analysis of logged yi vals #146

Closed egouldo closed 1 week ago

egouldo commented 2 weeks ago

relevant line from logged analysis in manuscript is:

mutate(exclusion_threshold = param_mean + 3*param_sd) %>% i.e.:

back_transformed_predictions <-
  ManyEcoEvo_yi %>%
  prepare_response_variables_yi(estimate_type = "yi",
                                param_table = ManyEcoEvo:::analysis_data_param_tables) %>%
  generate_yi_subsets()

raw_mod_data_logged <-
  back_transformed_predictions %>%
  filter(dataset == "eucalyptus") %>%
  group_by(estimate_type) %>%
  select(estimate_type, data) %>%
  unnest(data) %>%
  rename(study_id = id_col) %>%
  hoist(params, param_mean = list("value", 1), param_sd = list("value", 2)) %>%
  rowwise() %>%
  mutate(exclusion_threshold = param_mean + 3*param_sd) %>%
  filter(fit < exclusion_threshold) %>%
  mutate(log_vals = map2(fit, se.fit, log_transform, 1000)) %>%
  unnest(log_vals) %>%
  select(study_id,
         TeamIdentifier,
         estimate_type,
         starts_with("response_"),
         -response_id_S2,
         ends_with("_log")) %>%
  group_by(estimate_type) %>%
  nest()

so values are removed prior to logging

egouldo commented 1 week ago

It just occurred to me why exclude_extreme_VZ() isn't behaving as intended for the Euc analyses... now they're on the response scale (raw counts), whereas before they were Z-transformed. So using the default threshold of 3 on estimates on the count scale isn't going to do much at all, hence why we've got extreme values getting through now!