MechanicalRabbit / FunSQL.jl

Julia library for compositional construction of SQL queries
https://mechanicalrabbit.github.io/FunSQL.jl
Other
144 stars 5 forks source link

Adding multiple qualifiers to table results in joins being subqueries #60

Closed ansaardollie closed 3 months ago

ansaardollie commented 3 months ago

Hi there,

Firstly just want to say a huge thank you for making this library. It really is the perfect solution to my current needs.

I am currently working on a project which uses multiple catalogues/schemas within a single datasource. When trying to implement this with the FunSQL library I get an unexpected behaviour for joins. Using the Join example found in the API reference for this library the following sql is rendered when no qualifiers are used on a SQLTable object:

using FunSQL: SQLTable, render, From, Join, Select, Get

person0 = SQLTable(:person, columns=[:person_id, :location_id])
location0 = SQLTable(:location, columns=[:location_id, :state])

query = From(:person) |> Join(:location => From(:location), Get.location_id .== Get.location.location_id) |> Select(Get.person_id, Get.location.state)

println(render(query, tables=[person0, location0]))

Output:

SELECT
  "person_1"."person_id",
  "location_1"."state"
FROM "person" AS "person_1"
JOIN "location" AS "location_1" ON ("person_1"."location_id" = "location_1"."location_id")

This is the expected behaviour, namely that the Join clause created joins directly to the table. This does not happen when using 2 or more qualifiers (the situation I find myself in: one for the DB and another for the schema).

person2 = SQLTable(:person, qualifiers=["EXAMPLE_DB", "EXAMPLE_SCHEMA"], columns=[:person_id, :location_id])
location2 = SQLTable(:location, qualifiers=["EXAMPLE_DB", "EXAMPLE_SCEHMA"], columns=[:location_id, :state])

println(render(query, tables=[person2, location2]))

Output:

SELECT
  "person_1"."person_id",
  "location_2"."state"
FROM "EXAMPLE_DB"."EXAMPLE_SCHEMA"."person" AS "person_1"
JOIN (
  SELECT
    "location_1"."location_id",
    "location_1"."state"
  FROM "EXAMPLE_DB"."EXMAPLE_SCHEMA"."location" AS "location_1"
) AS "location_2" ON ("person_1"."location_id" = "location_2"."location_id")

As you can see, whenever one uses 2+ qualifiers then the join clause is created using an inner/nested/sub query. While I am not very experienced in DB tuning, I don't think that is as efficient as a direct join.

I am also unsure if this expected behaviour for the library and perhaps there is a configuration setting I can change to amend the behaviour.

Thank you :)

xitology commented 3 months ago

Good catch. It's caused by an unnecessarily specific condition on collapsing the join branch and should be fixed by https://github.com/MechanicalRabbit/FunSQL.jl/commit/971b299a5eb5f305f13abd1b6a576009b9cd326f. However, I'd expect that the database engine would reduce both variants to the same query plan, so it should not affect the query performance.

ansaardollie commented 3 months ago

Thank you so much for your prompt fix.