ccao-data / model-res-avm

Automated valuation model for all class 200 residential properties in Cook County (except vacant land and condos)
GNU Affero General Public License v3.0
20 stars 3 forks source link

Update `ingest` stage to use `noctua` `unload = TRUE` option #242

Closed Damonamajor closed 2 weeks ago

Damonamajor commented 1 month ago

When implementing the noctua = TRUE option, values for tax_loc would be replaced with . To remedy this, a modification was implemented in the ingest script to extract the values, concatenate them when there are more than 1 observation, extract it when there is one observation, or replace with NULL when there are 0 observations.

The package Rcompare was used to look at the output files when noctua was coded as TRUE and FALSE. Aside of the handling of arrays, it seems as though the only other difference was rounding at different intervals. I'm going to run another test over the weekend to double check this, but it takes forever on my comp. Values are also sorted differently, probably not important, but may be worth while to take note of for the future.

When I run renv::install("DyfanJones/noctua"), it doesn't update the renv in github. My assumption is that this needs to be done from admin side.

This query won't work unless year is manually set to 2022. But, I don't imagine that's an issue with noctua.

land_site_rate_data <- dbGetQuery(
  conn = AWS_ATHENA_CONN_NOCTUA, glue("
  SELECT *
  FROM ccao.land_site_rate
  WHERE year = '2022'
  ")
)
dfsnow commented 2 weeks ago

@Damonamajor I had to make some minor tweaks to wrap this up, mostly stuff related to updating the renv lockfile that manages the noctua version (since we need the latest version for unload = TRUE to work). See 8aae9525773e79981dea31947236a6e219ba3370 for the changes.