DOI-USGS / lake-temperature-model-prep

Pipeline #1
Other
6 stars 13 forks source link

Add iadnr data #284

Closed padilla410 closed 2 years ago

padilla410 commented 2 years ago

Data summary:

I think the major reason for the loss here is the mismatch between the raw data and the existing lookup table. The existing look up table has 132 lakes while the raw data has 176. We could easily pick this up if we had the rest of the spatial info.

A few checks:

library(scipiper)
library(arrow)

# load current version of data set on gdrive
dat_gdrive <- read_feather(sc_retrieve("7a_temp_coop_munge/out/all_coop_dat_linked.feather.ind"))

# filter for recent IA DNR data
dat_iadnr <- dat_gdrive %>% 
  filter(source == '7a_temp_coop_munge/tmp/Iowa_DNR_LimnoProfiles_2000_2020.rds')

# check record count
nrow(dat_iadnr)

# check lake count
length(unique(dat_iadnr$state_id))

the outputs:

> nrow(dat_iadnr)
[1] 491109

> length(unique(dat_iadnr$state_id))
[1] 113

Snapshot of 8_viz/out/lakes_summary_fig.html image

jordansread commented 2 years ago

I end up with an error when trying to parse the files

Error in make.names(col.names, unique = TRUE): invalid multibyte string 5; debug with scmake("7a_temp_coop_munge/tmp/Iowa_DNR_LimnoProfiles_2000_2020.rds.ind", "7a_temp_co...

I will try to track down which file that is. Perhaps a windows/mac difference on one of the parsers/files?

jordansread commented 2 years ago

Iowa_DNR_LimnoProfiles_2000-2020/Iowa_profiles_2017b/17129263ysi.csv causes the error for me

it is addressed by adding the fileEncoding to read.csv

read.csv(x, fileEncoding="latin1")

in the else of parse_2017_2020_data()

lindsayplatt commented 2 years ago

I just did a before/after with the new HTML and the old one. Iowa is on the map for PGDL now 🎉

image

padilla410 commented 2 years ago

OK! I resolved all of the comments and successfully ran scmake(7a_temp_coop_munge) locally (options(scipiper.dry_put = TRUE)). I intend to do a run that talks to gdrive after a finish a few summary tasks for a 3 pm ET meeting today.