MechanicalRabbit / FunSQL.jl

Julia library for compositional construction of SQL queries
https://mechanicalrabbit.github.io/FunSQL.jl
Other
146 stars 6 forks source link

`Get` attributes versus table attributes #4

Closed jtrakk closed 3 years ago

jtrakk commented 3 years ago

Hello, I'm really glad this project is starting to exist. I've been checking out the documentation and trying to learn about it.

One part of the design that I'm curious about is I'm interested in why Get is used.

In the first query example, we have

const person =
    SQLTable(:person, columns = [:person_id, :year_of_birth, :location_id])
const location =
    SQLTable(:location, columns = [:location_id, :city, :state])
const visit_occurrence =
    SQLTable(:visit_occurrence, columns = [:visit_occurrence_id, :person_id, :visit_start_date])

q = person |>
    Where(Get.year_of_birth .<= 1950) |>
    Join(:location => location,
         on = Get.location_id .== Get.location.location_id) |>
    Where(Get.location.state .== "IL") |>
    Join(:visit_group => visit_occurrence |>
                         Group(Get.person_id),
         on = Get.person_id .== Get.visit_group.person_id,
         left = true) |>
    Select(Get.person_id,
           :max_visit_start_date =>
               Get.visit_group |> Agg.max(Get.visit_start_date))

Why is it designed to use Get.visit_start_date instead of visit_occurrence.visit_start_date as it would appear in SQL? Likewise, what is the advantage of an API that wants Get.location.location_id instead of location.location_id? Is this style inspired by some other query builder?

xitology commented 3 years ago

I'm glad you are interested in FunSQL. I'll try to clarify some of the design decisions.

Using a SQLTable object for attribute lookup is problematic for several reasons.

  1. It could be ambiguous if the table is used in a query more than once.
  2. It doesn't work with user-defined attributes created with Define(). Here is an example: https://github.com/MechanicalRabbit/FunOHDSI.jl/blob/master/demo/ex-10-2.jl#L96-L97. It will also not work with some SQL operations such as UNION ALL, where the same attribute may come from two different tables.
  3. It makes it impossible to write query fragments that expect certain attributes, but don't really care where they are coming from. The CollapseInterval() function in https://github.com/MechanicalRabbit/FunOHDSI.jl/blob/master/demo/ex-10-2.jl#L197-L206 is a good example. It expects the attributes person_id, start_date, and end_date to be available, but doesn't care which one of the event tables they are coming from.

Even so, it's possible to be more explicit with attribute lookup, but instead of using SQLTable, you'd use attribute lookup on SQLNode objects like this:

q1 = From(person)
q2 = q1 |> Where(q1.year_of_birth .<= 1950)
q3 = From(location)
q4 = q2 |> Join(q3, q1.location_id .== q3.location_id) # `q2.location_id .== q3.location_id` also works
q5 = q4 |> Where(q3.state .== "IL")
...

The approach is inspired by XPath and XPath-like query languages. For example, imagine that the first intermediate Join query returns a nested record with fields (person_id, year_of_birth, location_id, location = (location_id, city, state)). Then expressions such as Get.year_of_birth or Get.location.state could be seen as extracting the components of this record.

jtrakk commented 3 years ago

I think I understand. Thanks for explaining!