Open jagephart opened 3 months ago
Possible way to combine multiple rows (sciname
/common_name
) for a single species.
ChatGPT suggestion 🦾
library(dplyr)
library(stringr)
# Sample data
df <- data.frame(
sciname = c("species1", "species1", "species2", "species3", "species3"),
common_name = c(NA, "preferred value1", "value2", NA, "value3"),
column2 = c("valueA", NA, NA, "valueB", NA)
)
# Custom function to select common_name based on a keyword
select_common_name <- function(names, keyword = "preferred") {
# Prioritize names containing the keyword
preferred_names <- names[str_detect(names, keyword)]
if (length(preferred_names) > 0) {
return(preferred_names[1]) # Return the first match
} else {
return(names[!is.na(names)][1]) # Return the first non-NA value if no match
}
}
# Combine rows with custom common_name selection
df_combined <- df %>%
group_by(sciname) %>%
summarise(
common_name = select_common_name(common_name),
across(everything(), ~ coalesce(!!!(.x)), .groups = 'drop')
)
df_combined
summarise(across(everything(), ~ coalesce(!!!(.x)), .groups = 'drop'))
: For each group, across(everything(), ~ coalesce(!!!(.x)))
applies the coalesce function across all columns, which returns the first non-NA value. The !!!
operator unquotes the list of columns.
sciname_metadata may not include all SAU scinames and is missing some common names: