OHDSI / CommonDataModel

Definition and DDLs for the OMOP Common Data Model (CDM)
https://ohdsi.github.io/CommonDataModel
888 stars 451 forks source link

New derived table: location_distance #253

Closed pavgra closed 1 year ago

pavgra commented 5 years ago

New derived table: location_distance

Proposal Owner: Pavel Grafkin, Gowtham Rao

Discussion: https://github.com/OHDSI/CommonDataModel/issues/220, https://github.com/OHDSI/WebAPI/issues/649

Proposal overview:

Description We would like to introduce:

So, we need to know the distances between people and care sites. Even though the distance between a person and a care site represents derived information, not all of OHDSI supported DBs have geo capabilities (and therefore we cannot compute the distances inside DB) plus the computation of the distances is pretty computationally intensive to do it on-demand, therefore there is a need for pre-calculation and physical data storage.

Performance considerations

Some of the questions raised during the discussions are related to volumes of the data and performance of queries.

Amount of relations can be described by number of Person * average number of Locations per person * average number of Care Sites a person visits during life in a Location and hardly will raise up to the sizes of the main domain tables (e.g. drug exposure, condition occurrence and so on). E.g. the median number of drug exposures per person in Optum is 44, the average is 61. I doubt that each person in DB will change 10 houses while visiting 4-6 care sites with different locations during life in each of houses.

Therefore, I would assume that the table might grow to the sizes of one of the main domains tables and so the quering performance will be equal. I cannot provide actual numbers since don't have access to any CDM v6 database which would hold location_history data. Maybe @gowthamrao can share some numbers (and I will try to get numbers w/o location history from some typical datasets)

Table format

location_distance

column type comment
person_location_id integer
care_site_location_id integer
distance float
unit_concept_id integer whether the distance is stored in miles, kilometers, meters, etc

Distance pre-calculation

WITH care_site_person_locations AS (
  SELECT DISTINCT cs_lh.location_id as care_site_location_id, p.location_id AS person_location_id
  FROM
    visit_occurrence v
    JOIN care_site cs ON cs.care_site_id = v.care_site_id
    JOIN location_history cs_lh ON  cs_lh.domain_id = /* OMOP generated. Care site */ CAST(57 AS VARCHAR) AND cs.care_site_id = cs_lh.entity_id AND cs_lh.start_date < v.visit_start_datetime AND v.visit_end_datetime < COALESCE(cs_lh.end_date, CAST('2099-12-31' AS DATE))
    JOIN location cs_l ON cs_l.location_id = cs_lh.location_id
    JOIN person p ON p.person_id = v.person_id
    JOIN location_history p_lh ON  p_lh.domain_id = /* OMOP generated. Person */ CAST(56 AS VARCHAR) AND p.person_id = p_lh.entity_id AND p_lh.start_date < v.visit_start_datetime AND v.visit_end_datetime < COALESCE(p_lh.end_date, CAST('2099-12-31' AS DATE))
    JOIN location p_l ON p_l.location_id = p_lh.location_id
)
INSERT INTO location_distance
SELECT
  person_location_id,
  care_site_location_id,
  ST_DISTANCE(
      ST_GeographyFromText('POINT(' || person_location.longitude || ' ' || person_location.latitude || ')'),
      ST_GeographyFromText('POINT(' || care_site_location.longitude || ' ' || care_site_location.latitude || ')')
  ),
  2110000000
FROM care_site_person_locations
  JOIN location person_location ON person_location.location_id = person_location_id
  JOIN location care_site_location ON care_site_location.location_id = care_site_location_id;
cgreich commented 5 years ago

What kind of unit_concept_id is 2110000000?

pavgra commented 5 years ago

This should be a unit of distance, e.g. miles (so something similar to http://athena.ohdsi.org/search-terms/terms/4121361). And I believe Geo Vocabs should introduce such concepts

cgreich commented 5 years ago

Units are in domain "Unit" and vocabulary "UCUM". And the question is if we want to standardize to km and get rid of the field altogether. Otherwise it is concept_id={9536, 9546, 9363}

pavgra commented 5 years ago

Otherwise it is concept_id={9536, 9546, 9363}

Nice to know, thanks

And the question is if we want to standardize to km and get rid of the field altogether

Since it simplifies things, sounds good for me. Anyway, Atlas / other tools will be able to convert units and therefore still allow end-user to work with any unit types

vojtechhuser commented 5 years ago

So nearest LabCorp may be 5 miles and nearest dialysis facility may be 15 miles. To calculate distance, (and assuming patient home is point A) - what are the specs for point B ?

clairblacketer commented 5 years ago

This is going to be discussed at the 4/9/2019 CDM meeting

clairblacketer commented 5 years ago

Here is the link to the vote for this proposal. If it passes it will go to the dev branch first and then a later discussion an vote will be held to put it in production.

don-torok commented 5 years ago

What is disadvantage of adding longitude and latitude as attributes of location table, other than they will be NULL for the vast majority of CDMs. Then location can be determine via your query.

cgreich commented 5 years ago

http://forums.ohdsi.org/t/cdm-votes-from-09apr2019-meeting/6590/5

clairblacketer commented 1 year ago

Closing as latitude and longitude were added to the location table.