Closed OndrejMottl closed 1 year ago
This is relevant to #98
I have been looking into the possible ways to make a new classification based on the Koopen-Geiger climate zones.
The GH classification has 3 levels (see wiki), of those we are using 1st (5 classes) and 3rd (30 classes). In addition, John made some alterations to the 2nd (now 15 classes).
I decided to review the process, here is a reproducible script:
#----------------------------------------------------------#
# 0. Setup -----
#----------------------------------------------------------#
library(here)
# Load configuration
source(
here::here(
"R/00_Config_file.R"
)
)
maximum_n_records_to_run_cluster <- 200
optimum_max_n_records_to_run_cluster <- 100
minumum_n_records_to_run_cluster <- 5
#----------------------------------------------------------#
# 2. load data -----
#----------------------------------------------------------#
data_meta <-
targets::tar_read(
name = "data_meta",
store = paste0(
data_storage_path,
"_targets_h1"
)
)
# we are not interested
data_work <-
data_meta %>%
# we are not interested in Africa
dplyr::filter(region != "Africa") %>%
# we are want to filter out weird islands in the ocean
dplyr::mutate(
is_oceania_outlier = ifelse(region == "Oceania" & long < -100, TRUE, FALSE)
) %>%
dplyr::filter(is_oceania_outlier == FALSE)
#----------------------------------------------------------#
# 3. helper functions -----
#----------------------------------------------------------#
get_classification_plot <- function(data_source) {
data_source %>%
dplyr::group_by(region, sel_classification) %>%
dplyr::summarise(
.groups = "drop",
count = n()
) %>%
dplyr::mutate(
dplyr::across(
tidyselect::where(
is.character
),
as.factor
)
) %>%
ggplot2::ggplot(
mapping = ggplot2::aes(
x = reorder(sel_classification, -count),
y = count,
fill = sel_classification
)
) +
ggplot2::facet_wrap(~region) +
ggplot2::geom_bar(
stat = "identity"
) +
ggplot2::geom_hline(
yintercept = maximum_n_records_to_run_cluster
) +
ggplot2::geom_hline(
yintercept = optimum_max_n_records_to_run_cluster,
lty = "dashed"
) +
ggplot2::geom_hline(
yintercept = minumum_n_records_to_run_cluster,
lty = "dotted"
) +
ggplot2::scale_y_continuous(
trans = "log",
breaks = c(0, 1, 3, 5, 10, 20, 50, 100, 150, 200, 300, 500)
) +
ggplot2::theme_bw() +
ggplot2::theme(
legend.position = "none",
axis.text.x = ggplot2::element_text(angle = 90, hjust = 1, vjust = 0.5),
axis.title.x = ggplot2::element_blank()
) +
ggplot2::labs(
y = "N records"
)
}
get_map_plot <- function(data_source) {
ggplot2::ggplot(
data = data_source,
mapping = ggplot2::aes(
x = long,
y = lat,
col = sel_classification
)
) +
ggplot2::borders(
fill = "gray90",
col = NA
) +
ggplot2::geom_point() +
ggplot2::theme(legend.position = "bottom") +
ggplot2::coord_quickmap(
xlim = range(data_source$long),
ylim = range(data_source$lat)
) +
ggplot2::theme_void()
}
get_classification_map_by_continent <- function(data_source) {
data_source %>%
split(.$region) %>%
purrr::map(
.f = ~ ggpubr::ggarrange(
get_classification_plot(.x),
get_map_plot(.x),
widths = c(0.5, 1),
common.legend = TRUE,
legend = "none"
)
) %>%
ggpubr::ggarrange(
plotlist = .,
common.legend = TRUE,
legend = "none"
)
}
There I will decide to select 3 important values:
maximum_n_records_to_run_cluster
(200) is the max we can MAYBE run before the cluster cuts us offoptimum_max_n_records_to_run_cluster
(100) should run without any issue on the supercomputerminumum_n_records_to_run_cluster
(5) is the minimal number of records we need to sum any results (and build a general trend of the predictorMy goal was to start with the most coarse and only go more fine in climate zones, which are too large.
#----------------------------------------------------------#
# 4. Classification test -----
#----------------------------------------------------------#
# 4.1 --- Koppen-Geiger classification - 5 levels
data_work %>%
dplyr::mutate(
sel_classification = ecozone_koppen_5
) %>%
get_classification_map_by_continent()
# 4.2 --- Split the "cold" into:
# - Dry winter
# - Dry summer
# - without dry season
data_work %>%
dplyr::mutate(
sel_classification = dplyr::case_when(
ecozone_koppen_5 == "Cold" ~ ecozone_koppen_15,
.default = ecozone_koppen_5
)
) %>%
get_classification_map_by_continent()
# 4.3 --- Split the "cold - without dry season" into:
# - hot summer
# - warm summer
# - cold summer
data_work %>%
dplyr::mutate(
sel_classification = dplyr::case_when(
ecozone_koppen_15 == "Cold_Without_dry_season" ~ ecozone_koppen_30,
ecozone_koppen_5 == "Cold" ~ ecozone_koppen_15,
.default = ecozone_koppen_5
)
) %>%
get_classification_map_by_continent()
I am very happy about this as we split it into manageable classes while only losing a few records in too small climate zones. Thoughts @Vivian-Felde, @SFlantua ?
Based on how it looks in Europe and North America, I really like the 4.3 - to me that makes a lot more sense to break up with these smaller zones.
I am proposing 4.3. The others are just a way of how I got there.
I agree with your proposal, to me it looks much improved from what we currently use.
Here is an overview of how the proposed classification (sel_classiifcation
) lays in between KG5 and KG30:
I also like 4.3 most. However, all continents except LA have the clusters quite well spatially organized, but in LA the "temperate" cluster is all over the place. Is there anything that can be done there?
If we want to keep with the GH, then I do not think so. I try to also split the temperate and it does not help with LA. However, it may be a better solution for EU and NA
# 4.4 --- Split also the "temperate" into:
# - Dry winter
# - Dry summer
# - without dry season
data_work %>%
dplyr::mutate(
sel_classification = dplyr::case_when(
ecozone_koppen_15 == "Cold_Without_dry_season" ~ ecozone_koppen_30,
ecozone_koppen_5 == "Cold" ~ ecozone_koppen_15,
ecozone_koppen_5 == "Temperate" ~ ecozone_koppen_15,
.default = ecozone_koppen_5
)
) %>%
get_classification_map_by_continent()
Agree, it's not very helpful for LA. The whole climate zone spatial delimitation is not really working for LA. We will just have to go with it then and hope that it doesnt become a critical points for reviewers and/or readers.
@Vivian-Felde do you agree with the 4.4? Or should we stick with 4.3?
I like 4.4 even better.
The current classification of 15 climate zones does not work as some zones are still very large.
we need to