ccao-data / data-architecture

Codebase for CCAO data infrastructure construction and management
https://ccao-data.github.io/data-architecture/
5 stars 3 forks source link

Update corner lot data output and script locations #274

Open wrridgeway opened 6 months ago

wrridgeway commented 6 months ago

Currently our corner lot script lives in the aws-s3/raw data-architecture folder, though it doesn't collect and upload any raw data. It should probably be moved to the warehouse folder, and should also be renamed from a spatial prefix to ccao since it doesn't upload any spatial data. I've included some rough code to help upload the data in its current form to the warehouse.

library(arrow)
library(DBI)
library(noctua)
library(readr)
source("utils.R")

# This script uploads corner lots to the S3 warehouse
AWS_S3_WAREHOUSE_BUCKET <- Sys.getenv("AWS_S3_WAREHOUSE_BUCKET")
output_bucket <- file.path(AWS_S3_WAREHOUSE_BUCKET, "ccao", "other", "corner_lot")

# Optionally enable query caching
noctua_options(cache_size = 10)

# Establish connection
AWS_ATHENA_CONN_NOCTUA <- dbConnect(noctua::athena())

# Grab townships for pin10 from iasWorld
townships <- dbGetQuery(
  conn = AWS_ATHENA_CONN_NOCTUA, "
  SELECT distinct
    substr(parid, 1, 10) AS pin10,
    user1 as township
  FROM iasworld.legdat
  WHERE user1 IS NOT NULL
    AND cur = 'Y'
    AND deactivat IS NULL
  ")

read_delim("Corners Final.csv") %>%
  summarise(
    is_corner_lot = as.logical(max(fnl_rsl, na.rm = TRUE)),
    .by = pin10
    ) %>%
  left_join(townships, by = "pin10") %>%
  group_by(township) %>%
  write_partitions_to_s3(output_bucket, is_spatial = FALSE, overwrite = TRUE)
wrridgeway commented 6 months ago

Additionally: