Closed TomWhite-MedStar closed 5 months ago
I'm not sure we're going to be able to deal with this case: the distinct person identity is lost now in the aggregations: we can't look at the aggregated results in achilles_results_ar and figure out who was in them to make a higher-level rollup (ie: the ancestors of the DRC) to exclude duplicates. I think we should treat these numbers as an estimate, and not an actual one. otherwise we will somehow need to go back to the raw tables to figure out how to de-dupe this, and that might be something that will have to be done at the Achilles level...not the webAPI level.
As discussed on the Atlas/WebAPI WG call, the summary stats that are used by WebAPI are coming from Achilles and this would require extension to Achilles to compute some of these stats at the person level. We'll note here that there is likely some over-counting due to the aggregation at the concept level vs. person level. We won't fix this here but want to note this for documentation and awareness.
Expected behavior
When we review DPC from Search, we expect it to be select(distinct person_id) for all persons who have any of the codes in that hierarchy.
Actual behavior
Instead, achilles_result_concept_count sums the # of persons for each concept_id involved in the hierarchy, without removing duplicate person IDs. We noticed this while reviewing patients with Sickle Cell Anemia.
In our data, we have hundreds of patients with Sickle Cell-hemoglobin SS diseases (concept_id = 22281), and the Atlas search says we have DPC = multiple thousand patients. However, when we create a simple cohort of anyone with code 22281 or its descendants, we get about 2/3rd the number we see on DPC.
Steps to reproduce behavior
When looking for unique patients with a concept or its children, we can do this, and get the right counts:
However, when we do a simplified replication of the query generated by Atlas Search, we get a duplicated patient count. Note that this is the code for generating the achilles_result_concept_count table (e.g. via https://atlas.ohdsi.org/WebAPI/ddl/achilles)
Shouldn't DPC be the count of distinct persons with the selected code hierarchy?