Open katy-sadowski opened 4 months ago
Encounter ID bd0581d7-28e9-30e9-7e4a-846556992490 appears in dbt's visit_occurrence table but not ETL-Synthea's. Encounter ID d910aaaf-872f-dabd-c0f1-742eacdde64c appears in ETL-Synthea's visit_occurrence but not dbt's. I think it has to do with the way IDs are being assigned here and then prioritization rules are assigned here.
This is also impacting the cost table, which links into the visit tables to get cost information for events. Thus costs are being calculated differently, and there are some discrepancies in the presence/absence of cost rows, as events are being linked to different encounters in the 2 runs.
Having trouble replicating this issue! When I compare both tables they look the same:
I have probably messed up though!
I do agree the logic could probably be cleaned up
huh! i'm honestly not 100% sure how the logic works so need to dig in further to determine what the issue is... i think this can be accomplished via the refactors i'd like to do to move from the ETL-Synthea SQL into something more "dbt-esque". which i'd like to start on soon 😃
To be honest neither do I! but yes definitely worth doing!! 😄
I have a new theory. I'm cleaning up typecasting, and realized that because ETL-Synthea truncates datetimes to dates and dbt-synthea does not, the visit logic is going to work differently. There are several comparisons of visit dates here which in ETL-Synthea compare dates, and in dbt-synthea compare datetimes.
TBD if it's better to use dates or datetimes for the purpose of this logic. I still need to dive in and understand exactly what's going on in this model.
I have a new theory. I'm cleaning up typecasting, and realized that because ETL-Synthea truncates datetimes to dates and dbt-synthea does not, the visit logic is going to work differently
Good catch - I completely missed that!
I observed differences between the ETL-Synthea output and the dbt-synthea output (1 visit existed in ETL-Synthea that didn't exist in dbt-synthea, and vice versa). It appears this might to be due to the fact that in some cases an arbitrary record is chosen from among 2 or more records in the process of populating visit_occurrence. I'm still not 100% sure what is going on, but things to look into include:
In general the visit logic feels very complicated; I wonder if there is a simpler way to generate IDs than what's being done here. Let's look into this as well :)