OHDSI / GIS

https://ohdsi.github.io/GIS
Apache License 2.0
8 stars 9 forks source link

Define the set of geography terms needed for Symposium #230

Closed kzollove closed 11 months ago

kzollove commented 1 year ago

geom/data type:

"boundary" type:

rtmill commented 1 year ago

For the given use case, we need "type" of political boundary in the US. MTFCC is a vocabulary we can use.

The relevant codes have "Tabulation Area" in the "Superclass" column: https://www2.census.gov/geo/pdfs/reference/mtfccs2022.pdf

rtmill commented 1 year ago

For the type of geometry, it looks like OGC maintains a version of that. I can't find any documentation with unique coding but the list is:

"Geometry is an abstract type. Geometry values belong to one of its concrete subtypes which represent various kinds and dimensions of geometric shapes. These include the atomic types Point, LineString, LinearRing and Polygon, and the collection types MultiPoint, MultiLineString, MultiPolygon and GeometryCollection."

Context: "The Open Geospatial Consortium (OGC) developed the Simple Features Access standard (SFA) to provide a model for geospatial data. It defines the fundamental spatial type of Geometry, along with operations which manipulate and transform geometry values to perform spatial analysis tasks. PostGIS implements the OGC Geometry model as the PostgreSQL data types geometry and geography."

p-talapova commented 1 year ago

The table below shows proposed geometry type concepts for potential integration into the OMOP Vocabulary.

geom_type_concept description research_question_example
Point A single location in space. Where are the highest concentrations of reported flu cases in a city?
Polygon A two-dimensional surface stored as a sequence of points defining its exterior boundary. Which neighborhoods have the highest rates of childhood asthma?
Raster Pixel-based data often used for aerial photos, satellite imagery, and elevation models. How do patterns of green space in a city correlate with rates of mental health issues?
LineString A simple line with a start and end point. How do pollution levels change along a city's major highways?
GeometryCollection A collection of different geometry types stored as a single record. What combination of health facilities and reported disease cases are there in a given region?
MultiPoint Represents a collection of points. Where have multiple outbreaks of a specific infectious disease been reported in a country?
MultiLineString Represents a collection of line strings. What are the travel routes of patients diagnosed with a contagious illness?
MultiPolygon Represents a collection of polygons. Which neighborhoods in California have been repeatedly affected by respiratory issues and complications such as asthma exacerbations and bronchitis due to smoke exposure from wildfires over the past decade?
CurvePolygon* Similar to a polygon but with boundaries that can be curves. In what regions do we see a non-linear increase in respiratory diseases due to factory emissions?
TIN* (Triangulated Irregular Network) Represents a surface as a set of contiguous, non-overlapping triangles. How does terrain elevation correlate with the spread of mosquito-borne diseases?
Solid* Represents a three-dimensional volume bounded by a polyhedral surface. How does proximity to underground toxins correlate with rates of specific illnesses?

* - do we need this?

@rtmill @kzollove Could you please provide your insights on any concepts that might appear redundant/missing and check if we have data available in all these specified formats?

kzollove commented 1 year ago

@p-talapova Sorry for the late response...

For the symposium, we need Point, Polygon, and MultiPolygon (to your question: these are the only types of data we currently have)

The "next steps" (not necessary for symposium) would be LineString MultiLineString, MultiPoint, GeometryCollection. These could likely be supported without major code refactors.

Raster is something that would likely require us to change our approach dramatically. It will be very important but is out of scope for now.