Open kzollove opened 12 months ago
TODO:
Elaboration of SQL approach
Robert Miller (Guest) can you update, use this issue as necessary. I would suggest high level approach, but use that ticket however you'd like to organize around this effort
Example common complexities to be addressed:
Data Source:
geom_local_epsg=sf::st_crs(staged, paramaters=TRUE)$epsg)"]}
geom_name=dplyr::select(sf::st_drop_geometry(staged), n = if('NAME' %in% colnames(staged)) 'NAME' else 'NAMELSAD')$n
geom_local_value=sf::st_as_binary(sf::st_as_sf(staged, coords=c('Latitude', 'Longitude'))$geometry
and similar to above, example complexities for attributes
Variable source:
(same example with two pieces)
1) ["dplyr::filter(staged,
Defining Parameter=='Ozone')",
2) "dplyr::mutate(staged,geom_join_column=paste0(stringr::str_pad(
State Code,width=2,pad=0),``stringr::str_pad(
County Code`,width=3,pad=0)),
another (handling hard coding in general:
... mutate(staged,geom_join_column=FIPS, attr_concept_id=2000000001, attr_start_date=as.Date('2018-01-01'),attr_end_date=as.Date('2018-12-31'),
Third item to specify:
Lay out implications of staging source data in a database (postgis) rather than current approach of keeping in memory
Fourth item: adding clarity on the "phases" that were mentioned and how the specific functionality falls under each 1) Ingestion (CLI, ogr, others) 1) Translation (can we do this comprehensively in SQL?) 1) Extraction/population of exposure occurrence (arguably out of scope for this conversation)