Closed clarkevans closed 7 months ago
This could be done with a lateral join, like this:
@funsql begin
from(drug_exposure)
cross_join(
event => begin
define(
id => :ID,
kind => "drug_exposure",
start_date => :START_DATE)
bind(
:ID => drug_exposure_id,
:START_DATE => drug_exposure_start_date)
end)
end
Databricks should be able to optimize this join away.
This join could be wrapped in a definition
@funsql create_event(; id, kind, start_date) =
cross_join(
event => begin
define(
id => :ID,
kind => $kind,
start_date => :START_DATE)
bind(
:ID => $id,
:START_DATE => $start_date)
end)
so that the query could be written simply as
@funsql begin
from(drug_exposure)
create_event(id = drug_exposure_id, kind = "drug_exposure", start_date = drug_exposure_start_date)
end
Would you be able to make this change? We could then start migrating some of the functions to use the more generic approach rather than using prefixes.
Have you considered simply renaming the columns, stripping the table name from the names of columns?
It's tempting. However, when displaying tables it would no longer match the OMOP schema. Then I'd have to guess what changed and what is different. Initially I even started with an "condition()" function but ended up going with the more verbose "condition_occurrence()" just to stick with the standard. I was thinking that an "event" define that is silent unless it is used would be a very good compromise, it wouldn't change the names but would permit us to write generic functions.
Speaking of conditions and condition occurrences, there is also a material difference between the two, and both could be employed in the same query. In fact, instead of calling the link corresponding to condition_concept_id
a concept
, it could be called condition
:
from(condition_occurrence)
join(person => from(person), person_id == person.person_id)
join(condition => from(concept), condition_concept_id == condition.concept_id)
...
There is one other limitation of OMOP names: there is no good way to represent a UNION of different types of events. Such unions are occasionally appear in eCQMs, although I don't know if we will need them. Say, for instance, smoking can be represented both as a observation and a condition, how would you show a list of all smoking related events to the researcher?
If we go with event
, what fields should it contain?
When I started this project, I had the idea that I'd try to stick with OMOP columns. This was a bad decision, as you can see from the complexity of join_via_cohort found in linking.jl -- the bulk of this complexity is simply dealing with the variance of column naming.
I started a refactor by adding an
event
self-join to each table, e.g.drug_exposure()
.This would let me write functions generically, using
event.start_date
and such. However, this sucks, it's got a unnecessary join. So, I was wondering how to do this better. I tried the following...or, perhaps
Then I could write
However, these don't seem to work and they create error looking up
event
. The following works, but it's tedious...So this ticket is either to improve FunSQL to make nested definitions work, or to redo everything using the second form (but it's more brittle).