OHDSI / GIS

https://ohdsi.github.io/GIS
Apache License 2.0
10 stars 9 forks source link

Decouple geom and attr spec logic away from R, towards generic JSON #285

Open kzollove opened 12 months ago

kzollove commented 12 months ago
rtmill commented 7 months ago

TODO:

Elaboration of SQL approach

Robert Miller (Guest) can you update, use this issue as necessary. I would suggest high level approach, but use that ticket however you'd like to organize around this effort

Example common complexities to be addressed:

Data Source:

geom_local_epsg=sf::st_crs(staged, paramaters=TRUE)$epsg)"]}
geom_name=dplyr::select(sf::st_drop_geometry(staged), n = if('NAME' %in% colnames(staged)) 'NAME' else 'NAMELSAD')$n
geom_local_value=sf::st_as_binary(sf::st_as_sf(staged, coords=c('Latitude', 'Longitude'))$geometry
rtmill commented 7 months ago

and similar to above, example complexities for attributes

Variable source: (same example with two pieces) 1) ["dplyr::filter(staged,Defining Parameter=='Ozone')", 2) "dplyr::mutate(staged,geom_join_column=paste0(stringr::str_pad(State Code,width=2,pad=0),``stringr::str_pad(County Code`,width=3,pad=0)),

another (handling hard coding in general: ... mutate(staged,geom_join_column=FIPS, attr_concept_id=2000000001, attr_start_date=as.Date('2018-01-01'),attr_end_date=as.Date('2018-12-31'),

rtmill commented 7 months ago

Third item to specify:

Lay out implications of staging source data in a database (postgis) rather than current approach of keeping in memory

Fourth item: adding clarity on the "phases" that were mentioned and how the specific functionality falls under each 1) Ingestion (CLI, ogr, others) 1) Translation (can we do this comprehensively in SQL?) 1) Extraction/population of exposure occurrence (arguably out of scope for this conversation)