Closed amw-zero closed 3 years ago
@blambeau any thoughts here?
@amw-zero first of all, sorry for the delay. I haven’t received a notification or email when you opened the PR.
It’s (most probably) a bug, probably related to the syntactic sugar allowing to pass a join hash. The thing is that the key => value
order in such hash is both very important ... and very counterintuitive
I don’t even remember myself what is the correct order, but I bet the second is right, not the first one. Since it’s syntactic sugar for:
transactions.join(tags.restrict(name: 'income').allbut([:id]).rename(:transaction_id => :id)).
Can you tell me if that works well?
Still, that does NOT explain why the results are different (except possibly if the query is actually wrong, since behavior is undefined in such case). A few tools that can help you get to the bottom of it:
r.debug # can be used anywhere to inspect the query tree
r.with_typecheck # not sure exactly (from memory), but the feature exists and is supposed to detect wrong queries and complain
No problem about the delay, I figured that was the case.
The query that you suggested didn't exactly work, but works with a tiny modification (explicitly specifying the column to join on, i.e. [:id]
). And more importantly it made me realize what the problem with my initial query was: both relations have an id
column. You used .allbut([:id])
on the tags
relation so that transaction_id
could be renamed to id
without a collision, and that name collision seems to be what's causing the problem with renaming.
This is the query that's working just fine for me now:
transactions.join(
tags.restrict(name: 'income').allbut([:id]),
{ id: :transaction_id }
)
I do agree though, ideally a query returns the same result no matter what the backend is (memory, sql, etc.). In my use case, I'm using memory relations during testing to test the logic of my queries. Doing this has been very convenient so far, since I don't need to create arbitrary repository objects. But this was the first problem I ran into where the memory relation and sql relations were inconsistent.
Hi @amw-zero, I wrote https://github.com/enspirit/bmg/pull/22 to make sure that our guesses and findings are right. They are.
The original query is actually wrong (rename
is not expected to override an existing attribute). Indeed, in such case, the SQL and Memory results are different. Possibly that's a problem.
Various corrected approaches work fine. I suggest you to execute the rspec and look at the resulting SQL. You might be surprised to see the generated DISTINCT
with a join
strategy, and could be interested in the matching
approach instead.
I checked that the type checker complains about the original query: it does, confirming 1.
I'm not completely sure it's worth fixing the bug. Yet, possibly the join/matching syntactic sugar could go a bit further when attributes are known, and generate the necessary allbut
for you. WDYT?
That PR is extremely helpful. Many thanks. I was unaware of the typechecking capability, and since that catches the error I think that is totally fine. I can arrange my tests so that they type check the memory relations and hopefully that will catch an error like this earlier on.
Ultimately I was just writing an ambiguous query! So I do agree that fixing the "bug" may not be worth it. This case is quite common (joining 2 tables with an id
column), but implicitly adding the allbut
might be too overbearing.
Thanks so much for the help! I'm really enjoying using the library so far, and will definitely report any other issues that I find.
I came across a surprising behavior, and I wanted to verify that it is in fact a bug before attempting to fix (I'd be more than happy to submit a patch if this is agreed to be a bug). The bug is that the same query run on an
InMemory::Relation
and aSql::Relation
returns different results.Consider the following relations:
transactions
is modeling some financial transactions, andtags
is modeling the ability to tag a transaction with some semantic name, like "income." I'm looking to query all transactions for a given tag. I expressed this with the following query:transactions.join(tags.restrict(name: 'income'), { id: :transaction_id })
.Running this on these in-memory relations returns the right result:
I get back the two transactions who are tagged as "income."
If you create the same relation in postgres and run the query on relations produced with Bmg.sequel(:transactions, ...), the query returns no results. This is because it produces the following SQL:
If you look at the join clause, the ON condition is:
(t1.id = t2.id)
, but by specifying{ id: :transaction_id }
in the BMGjoin
operator, I expected the produced ON condition to be:(t1.id = t2.transaction_id)
.I can get the produced SQL to be what I expect by reversing the join, i.e. by doing:
tags.restrict(name: 'income').join(transactions, { transaction_id: :id })
.But because joins in relational algebra are commutative, I would expect both of these to produce the same result. I may be misunderstanding the Hash parameter that gets passed into the
join
operator though.I created a repo that sets up and exposes the bug if someone would like to play around with it in more depth: https://github.com/amw-zero/bmg_query_bug.
Is this a bug or am I using the
join
operator incorrectly?