improve sciname_metadata

Possible way to combine multiple rows (sciname/common_name) for a single species.

ChatGPT suggestion 🦾

library(dplyr)
library(stringr)

# Sample data
df <- data.frame(
  sciname = c("species1", "species1", "species2", "species3", "species3"),
  common_name = c(NA, "preferred value1", "value2", NA, "value3"),
  column2 = c("valueA", NA, NA, "valueB", NA)
)

# Custom function to select common_name based on a keyword
select_common_name <- function(names, keyword = "preferred") {
  # Prioritize names containing the keyword
  preferred_names <- names[str_detect(names, keyword)]
  if (length(preferred_names) > 0) {
    return(preferred_names[1]) # Return the first match
  } else {
    return(names[!is.na(names)][1]) # Return the first non-NA value if no match
  }
}

# Combine rows with custom common_name selection
df_combined <- df %>%
  group_by(sciname) %>%
  summarise(
    common_name = select_common_name(common_name),
    across(everything(), ~ coalesce(!!!(.x)), .groups = 'drop')
  )

df_combined

summarise(across(everything(), ~ coalesce(!!!(.x)), .groups = 'drop')): For each group, across(everything(), ~ coalesce(!!!(.x))) applies the coalesce function across all columns, which returns the first non-NA value. The !!! operator unquotes the list of columns.

Seafood-Globalization-Lab / artis-model

improve sciname_metadata #5