data-mermaid / mermaidr

R package for accessing MERMAID authenticated API endpoints
https://data-mermaid.github.io/mermaidr/
MIT License
10 stars 1 forks source link

Enhance reference data #21

Closed sharlagelfand closed 3 years ago

sharlagelfand commented 3 years ago

Either enhance data returned from mermaid_get_reference() directly, or write wrapper functions that return enhanced data, e.g. mermaid_get_benthic_attributes(), mermaid_get_fish_genera(), etc.

"Enhanced" meaning that they return the actual values of columns present in the data, rather than just the IDs. Some sketch code of how to get this and what it could look like:

library(dplyr)
library(mermaidr)
library(tibble)

# Benthic Attributes ----

# Replace "parent" with the actual name

benthicattributes <- mermaid_get_reference("benthicattributes")

benthicattributes
#> # A tibble: 403 x 6
#>    id             name       status parent         updated_on     created_on    
#>    <chr>          <chr>      <chr>  <chr>          <chr>          <chr>         
#>  1 01d8297a-1d37… Halophila  Open   c9c20002-a7fd… 2018-04-04T19… 2018-04-04T19…
#>  2 02160430-8036… Mussa ang… Open   e5e686e5-5f55… 2020-02-17T19… 2020-02-17T19…
#>  3 046707b4-bae6… Neopetros… Open   6b9fdd82-685f… 2020-02-17T19… 2020-02-17T19…
#>  4 04a8165f-c936… Palisada   Open   09226989-50e7… 2020-07-01T14… 2020-07-01T14…
#>  5 04b943e8-610c… Gardinero… Open   3151a2f4-1f93… 2018-04-04T19… 2018-04-04T19…
#>  6 050c0689-0b01… Acropora   Open   a36b36c6-1f9f… 2018-04-04T19… 2018-04-04T19…
#>  7 053de64e-a2b9… Padina     Open   09226989-50e7… 2019-02-04T21… 2018-04-04T19…
#>  8 0682d950-18fb… Hypogloss… Open   09226989-50e7… 2020-02-17T19… 2020-02-17T19…
#>  9 07828914-a3e2… Cliona     Open   6b9fdd82-685f… 2020-02-17T19… 2020-02-17T19…
#> 10 08497e2e-570d… Acanthoph… Open   09226989-50e7… 2020-02-17T19… 2020-02-17T19…
#> # … with 393 more rows

# Sketch of code

benthicattributes %>%
  left_join(benthicattributes %>%
              select(parent_id = id, parent = name), by = c("parent" = "parent_id"), suffix = c("_id", ""))
#> # A tibble: 403 x 6
#>    id               name        status parent    updated_on      created_on     
#>    <chr>            <chr>       <chr>  <chr>     <chr>           <chr>          
#>  1 01d8297a-1d37-4… Halophila   Open   Hydrocha… 2018-04-04T19:… 2018-04-04T19:…
#>  2 02160430-8036-4… Mussa angu… Open   Mussidae  2020-02-17T19:… 2020-02-17T19:…
#>  3 046707b4-bae6-4… Neopetrosi… Open   Sponge    2020-02-17T19:… 2020-02-17T19:…
#>  4 04a8165f-c936-4… Palisada    Open   Macroalg… 2020-07-01T14:… 2020-07-01T14:…
#>  5 04b943e8-610c-4… Gardineros… Open   Agaricii… 2018-04-04T19:… 2018-04-04T19:…
#>  6 050c0689-0b01-4… Acropora    Open   Acropori… 2018-04-04T19:… 2018-04-04T19:…
#>  7 053de64e-a2b9-4… Padina      Open   Macroalg… 2019-02-04T21:… 2018-04-04T19:…
#>  8 0682d950-18fb-4… Hypoglossum Open   Macroalg… 2020-02-17T19:… 2020-02-17T19:…
#>  9 07828914-a3e2-4… Cliona      Open   Sponge    2020-02-17T19:… 2020-02-17T19:…
#> 10 08497e2e-570d-4… Acanthopho… Open   Macroalg… 2020-02-17T19:… 2020-02-17T19:…
#> # … with 393 more rows

# Fish Genera ----

# Replace "family" with actual family

fishgenera <- mermaid_get_reference("fishgenera")
fishfamilies <- mermaid_get_reference("fishfamilies")

fishgenera %>%
  left_join(fishfamilies %>%
              select(id, family = name), by = c("family" = "id"), suffix = c("_id", "")) %>%
  glimpse()
#> Rows: 750
#> Columns: 9
#> $ id                 <chr> "00b8d1bd-b873-400f-bf81-6abddf0d13ce", "018c6b47-…
#> $ name               <chr> "Ostorhinchus", "Nebrius", "Cirrhilabrus", "Oxyuri…
#> $ status             <chr> "Open", "Open", "Open", "Open", "Open", "Open", "O…
#> $ biomass_constant_a <dbl> 0.009441, 0.004170, 0.016603, 0.009780, 0.003890, …
#> $ biomass_constant_b <dbl> 3.198185, 3.070000, 2.955000, 2.989112, 3.120000, …
#> $ biomass_constant_c <dbl> 1.000000, 1.000000, 1.000000, 1.000000, 1.000000, …
#> $ family             <chr> "Apogonidae", "Ginglymostomatidae", "Labridae", "G…
#> $ created_on         <chr> "2018-04-04T19:03:58.270624Z", "2018-04-04T19:04:1…
#> $ updated_on         <chr> "2018-04-04T19:03:58.270643Z", "2018-04-04T19:04:1…

# Fish Species ----

# Change "display" to "species", replace genus, group_size, trophic_group, and functional_group with actual values

fishspecies <- mermaid_get_reference("fishspecies")
choices <- mermaid_get_endpoint("choices") %>%
  deframe()

fishgroupsizes <- choices[["fishgroupsizes"]] %>%
  select(id, group_size = name)

fishgrouptrophics <- choices[["fishgrouptrophics"]] %>%
  select(id, trophic_group = name)

fishgroupfunctions <- choices[["fishgroupfunctions"]] %>%
  select(id, functional_group = name)

genus <- fishgenera %>%
  select(id, genus = name)

fishspecies %>%
  rename(species = display) %>%
  left_join(genus, by = c("genus" = "id"), suffix = c("_id", "")) %>%
  left_join(fishgroupsizes, by = c("group_size" = "id"), suffix = c("_id", "")) %>%
  left_join(fishgrouptrophics, by = c("trophic_group" = "id"), suffix = c("_id", "")) %>%
  left_join(fishgroupfunctions, by = c("functional_group" = "id"), suffix = c("_id", "")) %>%
  glimpse()
#> Rows: 3,275
#> Columns: 19
#> $ id                 <chr> "0006e6d8-7501-4c2d-9cda-263194f8e58b", "001335da-…
#> $ name               <chr> "rubrioperculatus", "maculatus", "perezii", "bigib…
#> $ species            <chr> "Lethrinus rubrioperculatus", "Aulostomus maculatu…
#> $ notes              <chr> "", "", "", "", "subfamily bodyshape", "subfamily …
#> $ status             <chr> "Open", "Open", "Open", "Open", "Open", "Open", "O…
#> $ biomass_constant_a <dbl> 0.012790, 0.003960, 0.027100, 0.014790, 0.028180, …
#> $ biomass_constant_b <dbl> 3.039657, 3.080000, 3.000000, 2.990000, 2.094000, …
#> $ biomass_constant_c <dbl> 1.000000, 1.000000, 1.000000, 1.000000, 1.000000, …
#> $ climate_score      <lgl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA…
#> $ vulnerability      <dbl> 39.56, 49.99, 76.49, 55.67, 10.00, 17.67, 19.23, 6…
#> $ max_length         <dbl> 50.00, 100.00, 300.00, 75.00, 9.00, 10.00, 7.50, 7…
#> $ trophic_level      <dbl> 3.83, 4.26, 4.50, 2.56, 3.10, 3.40, 2.76, 3.90, 3.…
#> $ max_length_type    <chr> "total length", "total length", "total length", "t…
#> $ genus              <chr> "Lethrinus", "Aulostomus", "Carcharhinus", "Kyphos…
#> $ group_size         <chr> "small group", "solitary", "solitary", "medium gro…
#> $ trophic_group      <chr> "invertivore-mobile", "piscivore", "piscivore", "h…
#> $ functional_group   <chr> "macro-invertivore", "pisci-invertivore", "piscivo…
#> $ created_on         <chr> "2018-04-04T19:04:23.107621Z", "2020-02-17T19:24:2…
#> $ updated_on         <chr> "2020-02-17T19:10:43.627478Z", "2020-02-17T19:24:2…

# Fish Families ----

# All good, no changes needed

mermaid_get_reference("fishfamilies")
#> # A tibble: 162 x 8
#>    id    name  status biomass_constan… biomass_constan… biomass_constan…
#>    <chr> <chr> <chr>             <dbl>            <dbl>            <dbl>
#>  1 0091… Kyph… Open            0.0193              3.03            0.986
#>  2 00b6… Mugi… Open            0.0166              2.94            0.974
#>  3 00f4… Zena… Open            0.00427             3.02            1    
#>  4 0226… Sphy… Open            0.00448             3.11            1    
#>  5 0880… Labr… Open            0.0120              3.04            0.997
#>  6 0aff… Scom… Open            0.0111              3.03            0.988
#>  7 0b69… Ophi… Open            0.00139             2.93            1    
#>  8 0d99… Albu… Open            0.0105              2.99            1    
#>  9 0e5b… Hemi… Open            0.0373              3.16            0.99 
#> 10 1513… Serr… Open            0.0136              3.03            0.997
#> # … with 152 more rows, and 2 more variables: created_on <chr>,
#> #   updated_on <chr>
esdarling commented 3 years ago

This is looking great! Can we add region for all? It looks available for each of the reference types online: https://collect.datamermaid.org/#/reference/home

sharlagelfand commented 3 years ago

Added in 0.3.1! Closing this issue