Open rtmill opened 5 years ago
Text View
JSON:
SQL (MS)
It appears everything in the cohort definition boils down to smaller data sets that are joined. One approach would be to have the spatial queries run before the SQL and populate tables, most likely temporary, that are then included in the SQL joins.
Using example 1 from above (patients who lived in area x). An R function would find all people that lived in area x and populate a temporary table with 'person_id'. That table is then referenced and joined in the SQL statement, then deleted after execution.
How that would be completed functionally, specifically with staggered execution and consistent naming, and how these functions could be represented in the JSON object is unclear. Perhaps a conversation with someone from the WebAPI WG?
@rtmill : Are you getting any input on these?
@cgreich I had a great call with @anthonysena and I believe the plan is to discuss with this the ATLAS/WebAPI WG and then involve the folks from Circe
@rtmill - Updating this issue after the 2020 OHDSI Symposium where I had the opportunity to demo work done on a geospatial Atlas component. This version does allow cohorts to be built off of geospatial concepts and uses the new geospatial vocabularies in Athena: Open Street Map and US Census. This architecture is database agnostic and all relationships between locations and regions are pre-computed in the concept relationship table. Custom regions would be added to the concept table and location-region relationships would be pre-computed during ETL. It won't fit every use case but does provide a starting point.
Requirements for what was implemented: https://github.com/OHDSI/WebAPI/issues/649
Code for the new features: https://github.com/OHDSI/webapi-component-geospatial https://github.com/OHDSI/atlas-component-geospatial
Video demo: https://youtu.be/6OebK5CfYo0
After listening to the discussion in the GIS WG I think the future of this work would align well with the work to modularize Atlas and integrate R components into Atlas. Arachne Execution Engine is a start at this concept and provides an R execution environment as an Atlas component.
The current paradigm in OHDSI is to package up cohort definitions into JSON objects which can be translated into singular SQL statements that fully define the cohort. Some of these definitions require calculations (e.g. 'where value is between x and y') but all of which are completed entirely in SQL. In our circumstance, given the lack of compatibility for GIS functionality among all DB flavors, we cannot package everything into SQL statements unless we have every possible calculation already precalculated and stored, which seems inadvisable if not impossible.
The question becomes, how could we expand the OHDSI cohort definition to include functionality outside of SQL.
Example use cases: