AquaSat / AquaMatch_harmonize_WQP

https://aquasat.github.io/AquaMatch_harmonize_WQP/
MIT License
1 stars 3 forks source link

Change time handling, undo .gitignore of chapters #97

Closed mbrousil closed 4 months ago

mbrousil commented 4 months ago

Hey all,

The goal with this PR is to fix issues with time handling, update the bookdown for the time methods and to respond to a request from Jack, and try to fix the issue with images not loading on the bookdown site. @steeleb if you are able to do a thorough review of the time-related changes that would be super helpful, though I know ASLO is next week so just lmk when you're able to do!

  1. Time handling: I realized that the ActivityStartDateTime column was a product of dataRetrieval and was in fact in UTC rather than local time. So I've updated fill_date_time() in 3_harmonize/src/clean_wqp_data.R in light of that. It should now produce harmonized_local_time, harmonized_tz, and harmonized_utc columns. Note that harmonized_local_time is character format, not datetime, because only a single tz is allowed per column in R. About 25% of our harmonized_utc times should be 1 hour off of ActivityStartDateTime; the vast majority of these are in the same direction. (See bottom for reprex). Ultimately it was more straightforward to handle DST inconsistencies in the data by allowing {lubridate} to apply DST based on location + date rather than to use time zone abbreviations in the dataset. This is in part because time zone strings like "CST" produce errors and location-based ones like "America/Chicago" don't. Also, I've added in the ActivityStartTime.Time column to the aggregated output in this version. It seemed best to include this with the rest of the date/time info for full usability.
  2. Bookdown has been edited to explain the above time changes and to give a quick explanation of what AquaSat v2 is
  3. I think the root of the issue with the bookdown site not loading images is that the docs/chapters/ folder was not being tracked on GitHub. The old name of this folder, _book/chapters was ignored and I think that carried over. So all of those files are now tracked as part of this PR

I have the current output uploaded to Drive if you need it! Let me know if there's any more info, etc. that you all need to review this. Thanks!

library(targets)
library(tidyverse)
library(feather)
library(kableExtra)

tar_load(p3_chla_agg_harmonized_feather)

p3_chla_agg_harmonized_feather %>%
  mutate(utc_diff = as.numeric(ymd_hms(ActivityStartDateTime) - harmonized_utc)) %>%
  ggplot() +
  geom_histogram(aes(utc_diff / 60^2)) +
  theme_bw()
#> Warning: There was 1 warning in `mutate()`.
#> ℹ In argument: `utc_diff = as.numeric(ymd_hms(ActivityStartDateTime) -
#>   harmonized_utc)`.
#> Caused by warning:
#> !  2420 failed to parse.
#> `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
#> Warning: Removed 404362 rows containing non-finite outside the scale range
#> (`stat_bin()`).


p3_chla_agg_harmonized_feather %>%
  mutate(utc_diff = as.numeric(ymd_hms(ActivityStartDateTime) - harmonized_utc)) %>%
  filter(utc_diff > 0) %>%
  select(harmonized_local_time, ActivityStartDateTime, harmonized_utc) %>%
  head(10)
#> Warning: There was 1 warning in `mutate()`.
#> ℹ In argument: `utc_diff = as.numeric(ymd_hms(ActivityStartDateTime) -
#>   harmonized_utc)`.
#> Caused by warning:
#> !  2420 failed to parse.
#> # A tibble: 10 × 3
#>    harmonized_local_time   ActivityStartDateTime harmonized_utc     
#>    <chr>                   <dttm>                <dttm>             
#>  1 2004-07-13 10:10:00 EDT 2004-07-13 15:10:00   2004-07-13 14:10:00
#>  2 2004-08-10 09:50:00 EDT 2004-08-10 14:50:00   2004-08-10 13:50:00
#>  3 2004-09-14 10:10:00 EDT 2004-09-14 15:10:00   2004-09-14 14:10:00
#>  4 2004-07-13 11:20:00 EDT 2004-07-13 16:20:00   2004-07-13 15:20:00
#>  5 2004-08-10 11:00:00 EDT 2004-08-10 16:00:00   2004-08-10 15:00:00
#>  6 2004-08-30 10:30:00 EDT 2004-08-30 15:30:00   2004-08-30 14:30:00
#>  7 2004-09-08 11:45:00 EDT 2004-09-08 16:45:00   2004-09-08 15:45:00
#>  8 2004-09-09 07:25:00 EDT 2004-09-09 12:25:00   2004-09-09 11:25:00
#>  9 2004-09-14 11:25:00 EDT 2004-09-14 16:25:00   2004-09-14 15:25:00
#> 10 2004-09-18 05:20:00 EDT 2004-09-18 10:20:00   2004-09-18 09:20:00

Created on 2024-05-30 with reprex v2.1.0

mbrousil commented 4 months ago

Merging and opening new PR with new changes