OHDSI / WebAPI

OHDSI WebAPI contains all OHDSI services that can be called from OHDSI applications
Apache License 2.0
126 stars 156 forks source link

Incorrect DPC counts - does not select distinct # of patients in a concept hierarchy #2317

Closed TomWhite-MedStar closed 5 months ago

TomWhite-MedStar commented 8 months ago

Expected behavior

When we review DPC from Search, we expect it to be select(distinct person_id) for all persons who have any of the codes in that hierarchy.

Actual behavior

Instead, achilles_result_concept_count sums the # of persons for each concept_id involved in the hierarchy, without removing duplicate person IDs. We noticed this while reviewing patients with Sickle Cell Anemia.

In our data, we have hundreds of patients with Sickle Cell-hemoglobin SS diseases (concept_id = 22281), and the Atlas search says we have DPC = multiple thousand patients. However, when we create a simple cohort of anyone with code 22281 or its descendants, we get about 2/3rd the number we see on DPC.

Steps to reproduce behavior

  1. When looking for unique patients with a concept or its children, we can do this, and get the right counts:

    select count(distinct co.person_id) -- gets correct number
    from current_omop.concept_ancestor ca 
    join current_omop.condition_occurrence co
    on co.condition_concept_id = ca.descendant_concept_id 
    where ca.ancestor_concept_id = 22281
  2. However, when we do a simplified replication of the query generated by Atlas Search, we get a duplicated patient count. Note that this is the code for generating the achilles_result_concept_count table (e.g. via https://atlas.ohdsi.org/WebAPI/ddl/achilles)

with cte as (
  select co.concept_id
  from current_omop.concept co
  join current_omop.concept_ancestor ca
  on co.concept_id = ca.descendant_concept_id 
  where ca.ancestor_concept_id = 22281
)
select sum(ar.count_value) as dpc -- gets wrong number - sums # of patients for parent and child concepts, but does not de-duplicate
from current_omop_results.achilles_results ar 
join cte
on ar.stratum_1 = cte.concept_id
where analysis_id = 400;

Shouldn't DPC be the count of distinct persons with the selected code hierarchy?

chrisknoll commented 8 months ago

I'm not sure we're going to be able to deal with this case: the distinct person identity is lost now in the aggregations: we can't look at the aggregated results in achilles_results_ar and figure out who was in them to make a higher-level rollup (ie: the ancestors of the DRC) to exclude duplicates. I think we should treat these numbers as an estimate, and not an actual one. otherwise we will somehow need to go back to the raw tables to figure out how to de-dupe this, and that might be something that will have to be done at the Achilles level...not the webAPI level.

anthonysena commented 5 months ago

As discussed on the Atlas/WebAPI WG call, the summary stats that are used by WebAPI are coming from Achilles and this would require extension to Achilles to compute some of these stats at the person level. We'll note here that there is likely some over-counting due to the aggregation at the concept level vs. person level. We won't fix this here but want to note this for documentation and awareness.