Closed akiellor closed 1 year ago
Does it handle it in the case of having low_cardinality_part
with a count more than 1?
i.e.,
low_cardinality_part = 1 and high_cardinality_part = 1 or low_cardinality_part = 1 and high_cardinality_part = 2 or low_cardinality_part = 2 and high_cardinality_part = 3
is not the same as
low_cardinality_part in (1, 2) and high_cardinality_part in (1, 2, 3)
if we have a table like:
low high
1 1
1 2
2 3
2 1
It would pull (2,1) in the latter case but not the former.
@codeodor it does, but there is no spec. I'll add one.
@codeodor Actually there is a spec.
def test_in_with_multiple_primary_key_parts
dep = Department.arel_table
primary_keys = [[1, 1], [1, 2], [2, 3], [2, 4]]
connection = ActiveRecord::Base.connection
quoted_id_column = "#{connection.quote_table_name('departments')}.#{connection.quote_column_name('id')}"
quoted_location_id_column = "#{connection.quote_table_name('departments')}.#{connection.quote_column_name('location_id')}"
expected = "(#{quoted_id_column} = 1 AND #{quoted_location_id_column} IN (1, 2) OR #{quoted_id_column} = 2 AND #{quoted_location_id_column} IN (3, 4))"
pred = cpk_in_predicate(dep, [:id, :location_id], primary_keys)
assert_equal(with_quoted_identifiers(expected), pred.to_sql)
end
Note how the query ends up being structured in a way where the high cardinality parts are grouped by the low cardinality parts and OR
ed together.
Thanks for the feedback y'all.
@cfis,
I've refactored the code a little, let me know what you think.
It was definitely possible to do all the work with a single pass of the list. I honestly hadn't given it much thought as the actual query in Postgres was dominating the end to end execution in our app, but this is better.
I went with some different method names, different to your suggestions, but hopefully reflects the intent.
I like this better - thanks for the changes although I'm sure you a right that the query building and execution is where all the time is spent.
Added a second round of comments.
@cfis, I think this is ready for another round. Thanks for your feedback.
@akiellor nice! Thanks for pointing that out to me. I missed it on my first pass.
Great - thanks for making this happen!
@cfis, is this change likely to be backported to all the different ActiveRecord version releases? We are still using ActiveRecord 6.1, so it would be nice to get it there if possible.
Happy to merge a MR if you want to make one.
Also, I'll cut a new master branch release in the next few days.
Query generation for cpk_in_predicate now constructs queries in the form:
It used to generate queries in the form:
This change improves the queries by reducing the overall length of the query, especially when loading many keys. But more importantly the new query will often result in Postgres performing an
Index Scan
instead of aBitmap Heap Scan
(assuming the right indices have been added).