Closed xitology closed 6 months ago
Just ran into this issue. If I understand correctly, ~binary operators in programming languages are typically left associative~ (associativity is not relevant, what matters is if we store relationships in the parent node, or the child node), so rebasing q >> Where(q.yob ...)
produces a copy of the node q
to put in the final query tree, while the original q
is left detached.
It is an interesting conundrum. When the bound reference is created, q
could also store a reference to Get.yob
, to go back and update it whenever q
is rebased. However, this makes tabular nodes stateful which might be problematic when reusing nodes. Do you see any other way around it?
So, this error is not limited to bound references. Consider the following example.
table = SQLTable(:table, columns=[:x, :y])
function self_join()
q = Where(Get.x .== 100)
return q |> Join(q |> As(:alias), on=Get.x .== Get.alias.x) |> Select(Get.y)
end
query = From(table) |> self_join()
print(render(query))
# ERROR: FunSQL.ReferenceError: cannot find x in:
# let table = SQLTable(:table, …),
# q1 = From(table),
# q2 = Get.x,
# q3 = Lit(100),
# q4 = Fun."=="(q2, q3),
# q5 = q1 |> Where(q4),
# q6 = Where(q4),
# q7 = q5 |> Join(q6 |> As(:alias), Fun."=="(Get.x, Get.alias.x)),
# q8 = q7 |> Select(Get.y)
# q8
# end
So, the issue really seems to be that FunSQL queries are not trees but DAGs. When we rebase the query pipeline A-B-C
, we leave node E
detached.
I haven't found a nice way to solve this yet.
Tracking bound references with a tabular node as parent in the parent isn't sufficient, since Join
nodes see the same error.
The simple solution is to make the query tree mutable and make rebase
an operation that just descends down the line of parent nodes, and swaps the null value at the end. That is, X |> (A |> B |> C)
just replaces the parent node for A.
However, now we need to ensure query fragments are never reused, which is not nice. The neat thing about FunSQL is that the data structures are kind of immutable, and can be freely composed.
Visit and collect all SQLNodes in the query tree that need to be rebased; go over them in the topological order, i.e A, B, E, C
in the diagram and make a copy of each.
Given each SQLNode has multiple attributes that could be nodes themselves, this would be an expensive and tedious.
What do you think?
It is an interesting conundrum. When the bound reference is created,
q
could also store a reference toGet.yob
, to go back and update it wheneverq
is rebased. However, this makes tabular nodes stateful which might be problematic when reusing nodes. Do you see any other way around it?
This is the place where FunSQL is a bit inconsistent. In general, FunSQL ignores the node identity, which means that any node in a query can be replaced with its structural copy. However, this is not valid when the node has a bound reference. In particular, a bound reference can be invalidated by query composition (|>
) because composition rewrites its right argument.
q = a |> b |> Where(b.x)
Here, composition a |> b
deconstructs the original object b
, which invalidates the reference b.x
.
This issue can be fixed by having composition preserve a reference to the original node, but is it worth it considering that the problem is pretty obscure? An easy workaround is to rewrite the query like this:
b = a |> b
q = b |> Where(b.x)
Hmm, that is a good point; it indeed is not a common occurrence. Though it goes against the FunSQL idea a bit, that sql queries written in modular functions can be composed freely.
For example, I can imagine people factoring out self-joins as a separate function. If the output node from self_join
ever gets rebased, the right hand side of the join will be left detached.
function self_join(q, col_name)
return q |> Join(q |> As(:alias), on=Get(col_name) .== Get.alias >> Get(col_name))
end
The solution is making sure users aware how rebase works, which I feel is an abstraction leak.
This issue can be fixed by having composition preserve a reference to the original node
The downside, besides implementation time, is added maintenance due to more complex code base?
So, this error is not limited to bound references. Consider the following example.
table = SQLTable(:table, columns=[:x, :y]) function self_join() q = Where(Get.x .== 100) return q |> Join(q |> As(:alias), on=Get.x .== Get.alias.x) |> Select(Get.y) end query = From(table) |> self_join() print(render(query)) # ERROR: FunSQL.ReferenceError: cannot find x in: # let table = SQLTable(:table, …), # q1 = From(table), # q2 = Get.x, # q3 = Lit(100), # q4 = Fun."=="(q2, q3), # q5 = q1 |> Where(q4), # q6 = Where(q4), # q7 = q5 |> Join(q6 |> As(:alias), Fun."=="(Get.x, Get.alias.x)), # q8 = q7 |> Select(Get.y) # q8 # end
So, the issue really seems to be that FunSQL queries are not trees but DAGs. When we rebase the query pipeline
A-B-C
, we leave nodeE
detached.I haven't found a nice way to solve this yet.
1. Tracking bound references with a tabular node as parent in the parent isn't sufficient, since `Join` nodes see the same error. 2. The simple solution is to make the query tree _mutable_ and make `rebase` an operation that just descends down the line of parent nodes, and swaps the null value at the end. That is, `X |> (A |> B |> C)` just replaces the parent node for A. However, now we need to ensure query fragments are never reused, which is not nice. The neat thing about FunSQL is that the data structures are kind of immutable, and can be freely composed. 3. Visit and collect all SQLNodes in the query tree that need to be rebased; go over them in the topological order, i.e `A, B, E, C` in the diagram and make a copy of each. Given each SQLNode has multiple attributes that could be nodes themselves, this would be an expensive and tedious.
What do you think?
You cannot construct this DAG by reusing the object q
. This is because FunSQL ignores the identity of the node objects (bound references are an exception), which means any node in a query could be replaced with a copy without changing the meaning of the query. In your example, it means you could equivalently write
q1 = Where(Get.x .== 100)
q2 = Where(Get.x .== 100)
return q1 |> Join(q2 |> As(:alias), on=Get.x .== Get.alias.x) |> Select(Get.y)
and it must produce exactly the same query.
One way to solve this is by moving the variable q
to the FunSQL level, that is, using With
to create a tabular "variable":
From(:q) |>
Join(From(:q) |> As(:alias), on = Get.x .== Get.alias.x)) |>
Select(Get.y) |>
With(From(table) |> Where(Get.x .== 100) |> As(:q))
This does not yet let you write a self_join()
function. To make self_join()
possible, we need a variant With′
(need a better name) of With
that exchanges the order of arguments: q1 |> With(q2)
is equivalent to q2 |> With′(q1)
. Then you can write
self_join() =
Where(Get.x .== 100) |>
As(:q) |>
With′(
From(:q) |>
Join(From(:q) |> As(:alias), on = Get.x .== Get.alias.x) |>
Select(Get.y))
From(table) |> self_join()
You cannot construct this DAG by reusing the object q. This is because FunSQL ignores the identity of the node objects (bound references are an exception), which means any node in a query could be replaced with a copy without changing the meaning of the query. In your example, it means you could equivalently write
Right, would you consider "a node can be replaced with a structural copy" to be an implementation detail?
It definitely makes query tree transformations easy and error surface smaller by making nodes immutable, though I would have liked the self-join query to work :).
Right, would you consider "a node can be replaced with a structural copy" to be an implementation detail?
This is just a different way to say algebraic or referentially transparent.
Bound references are no longer supported in FunSQL v0.13.0.