AquaSat / AquaMatch_harmonize_WQP

https://aquasat.github.io/AquaMatch_harmonize_WQP/
MIT License
1 stars 3 forks source link

Column additions, times, etc. #88

Closed mbrousil closed 5 months ago

mbrousil commented 6 months ago

Hey @matthewross07 and @steeleb,

This PR should address the changes we've been discussing the past ~week or so. Let me know if anything is missing! I went through my notes and it looks like it all should be covered here.

Notable changes and scripts:

One request: Any thoughts on how the code below could be sped up? It added hours to the pipeline. https://github.com/rossyndicate/AquaMatch_harmonize_WQP/blob/c021fde8c9ac6bfa3e9a3aa4dd0f1e2859e3f754/3_harmonize/src/clean_wqp_data.R#L471-L481

Lastly, the final, aggregated dataset now looks like this:

library(tidyverse)
library(targets)

tar_load(p3_chla_agg_harmonized_feather)

str(p3_chla_agg_harmonized_feather)
#> tibble [3,320,745 × 27] (S3: tbl_df/tbl/data.frame)
#>  $ parameter                         : chr [1:3320745] "chlorophyll" "chlorophyll" "chlorophyll" "chlorophyll" ...
#>  $ OrganizationIdentifier            : chr [1:3320745] "11113300" "11113300" "11113300" "11113300" ...
#>  $ MonitoringLocationIdentifier      : chr [1:3320745] "11113300-00A-SOP" "11113300-00F-NFD" "11113300-00F-NFD" "11113300-00F-NFD" ...
#>  $ MonitoringLocationTypeName        : chr [1:3320745] "River/Stream" "River/Stream" "River/Stream" "River/Stream" ...
#>  $ ResolvedMonitoringLocationTypeName: chr [1:3320745] "Stream" "Stream" "Stream" "Stream" ...
#>  $ ActivityStartDate                 : Date[1:3320745], format: "2015-05-19" "2011-06-21" ...
#>  $ ActivityStartDateTime             : POSIXct[1:3320745], format: "2015-05-19 15:30:00" "2011-06-21 20:00:00" ...
#>  $ ActivityStartTime.TimeZoneCode    : chr [1:3320745] "EDT" "EDT" "EDT" "EDT" ...
#>  $ harmonized_tz                     : chr [1:3320745] "America/New_York" "America/New_York" "America/New_York" "America/New_York" ...
#>  $ harmonized_utc                    : POSIXct[1:3320745], format: "2015-05-19 19:30:00" "2011-06-22 00:00:00" ...
#>  $ harmonized_top_depth_value        : num [1:3320745] NA NA NA NA NA NA NA NA NA NA ...
#>  $ harmonized_top_depth_unit         : chr [1:3320745] "m" "m" "m" "m" ...
#>  $ harmonized_bottom_depth_value     : num [1:3320745] NA NA NA NA NA NA NA NA NA NA ...
#>  $ harmonized_bottom_depth_unit      : chr [1:3320745] "m" "m" "m" "m" ...
#>  $ harmonized_discrete_depth_value   : num [1:3320745] NA NA NA NA NA NA NA NA NA NA ...
#>  $ harmonized_discrete_depth_unit    : chr [1:3320745] "m" "m" "m" "m" ...
#>  $ depth_flag                        : num [1:3320745] 0 0 0 0 0 0 0 0 0 0 ...
#>  $ mdl_flag                          : num [1:3320745] 1 0 0 0 0 0 0 0 0 0 ...
#>  $ approx_flag                       : num [1:3320745] 0 0 0 0 0 0 0 0 0 0 ...
#>  $ greater_flag                      : num [1:3320745] 0 0 0 0 0 0 0 0 0 0 ...
#>  $ tier                              : num [1:3320745] 2 2 2 2 2 2 2 2 2 2 ...
#>  $ field_flag                        : num [1:3320745] 2 2 2 2 2 2 2 2 2 2 ...
#>  $ harmonized_units                  : chr [1:3320745] "ug/L" "ug/L" "ug/L" "ug/L" ...
#>  $ subgroup_id                       : num [1:3320745] 1 2 3 4 5 6 7 8 9 10 ...
#>  $ harmonized_row_count              : num [1:3320745] 1 1 1 1 1 1 1 1 2 1 ...
#>  $ harmonized_value_sd               : num [1:3320745] NA NA NA NA NA ...
#>  $ harmonized_value                  : num [1:3320745] NA 1.5 1.42 1.83 0.81 2.75 0.67 0.66 2.4 1.63 ...

Created on 2024-04-17 with reprex v2.1.0

mbrousil commented 5 months ago

@steeleb could you just take a quick look at the most recent changes I made?