Closed DeD1rk closed 10 months ago
Sorry for the late response, but it's the Christmas holidays after all.
This definitely looks weird and I can't really tell how or why it happens yet either. I'm also going to try and reproduce the issue in the test setup to get a better look at it (thanks for the example). Though it might take a bit as it's still the holidays. If you manage to reproduce the issue in a simpler setup, then I'd be happy to see that example as well.
Thanks, yes I'll try to find out more. I'm thinking it should be pretty straightforward to find out how it works from the SQL statements. I'll see if I can strip them down further. Happy holidays!
I don't have easier example data yet, but I tracked down what I think causes the issue.
Here's a diff between (essentially; I removed all of the fields with .values_list() and filled in the constants, the issue remained the same) the normal correct query and the .exists()
one:
```sql SELECT '1' AS "a" FROM "events_eventregistration" INNER JOIN "events_event" ON ( "events_eventregistration"."event_id" = "events_event"."id" ) LEFT OUTER JOIN "events_eventregistration" T3 ON ("events_event"."id" = T3."event_id") WHERE ( "events_eventregistration"."date_cancelled" IS NULL AND ( "events_event"."start" AT TIME ZONE 'Europe/Amsterdam' )::date >= '2023-09-01' AND CASE WHEN ( "events_eventregistration"."special_price" IS NOT NULL ) THEN "events_eventregistration"."special_price" ELSE "events_event"."price" END > 0 ) GROUP BY "events_eventregistration"."date_cancelled", "events_event"."max_participants", CASE WHEN "events_eventregistration"."special_price" IS NOT NULL THEN "events_eventregistration"."special_price" ELSE "events_event"."price" END HAVING CASE WHEN ( "events_eventregistration"."date_cancelled" IS NULL ) THEN NULLIF( GREATEST( ( COUNT(T3."id") FILTER ( WHERE ( T3."date_cancelled" IS NULL AND ( T3."date" < ("events_eventregistration"."date") OR ( T3."id" <= ("events_eventregistration"."id") AND T3."date" = ("events_eventregistration"."date") ) ) ) ) - "events_event"."max_participants" ), 0 ), 0 ) ELSE NULL END IS NULL LIMIT 1 ``` ```sql SELECT "events_eventregistration"."id" FROM "events_eventregistration" INNER JOIN "events_event" ON ( "events_eventregistration"."event_id" = "events_event"."id" ) LEFT OUTER JOIN "events_eventregistration" T3 ON ("events_event"."id" = T3."event_id") WHERE ( "events_eventregistration"."date_cancelled" IS NULL AND ( "events_event"."start" AT TIME ZONE 'Europe/Amsterdam' )::date >= '2023-09-01' AND CASE WHEN ( "events_eventregistration"."special_price" IS NOT NULL ) THEN "events_eventregistration"."special_price" ELSE "events_event"."price" END > 0 ) GROUP BY "events_eventregistration"."date_cancelled", "events_event"."max_participants", CASE WHEN "events_eventregistration"."special_price" IS NOT NULL THEN "events_eventregistration"."special_price" ELSE "events_event"."price" END, "events_eventregistration"."id" HAVING CASE WHEN ( "events_eventregistration"."date_cancelled" IS NULL ) THEN NULLIF( GREATEST( ( COUNT(T3."id") FILTER ( WHERE ( T3."date_cancelled" IS NULL AND ( T3."date" < ("events_eventregistration"."date") OR ( T3."id" <= ("events_eventregistration"."id") AND T3."date" = ("events_eventregistration"."date") ) ) ) ) - "events_event"."max_participants" ), 0 ), 0 ) ELSE NULL END IS NULL ```
The difference in the GROUP BY
clause is surprising. So I tried adding the "events_eventregistration"."id"
grouping key into the .exists()
query, and indeed that solves the problem.
So I'm thinking it has to have something to do with the | Q(event__eventregistration__id__lte=F("id"))
part in
queue_position = AnnotationProperty(
Case(
# Get queue position by counting amount of registrations with lower date and in case of same date lower id
# Subsequently cast to None if this is 0 or lower, in which case it isn't in the queue
# If the current registration is cancelled, also force it to None.
When(
date_cancelled=None,
then=NullIf(
Greatest(
Count(
"event__eventregistration",
filter=Q(event__eventregistration__date_cancelled=None)
& (
Q(event__eventregistration__date__lt=F("date"))
| Q(event__eventregistration__id__lte=F("id"))
& Q(event__eventregistration__date__exact=F("date"))
),
)
- F("event__max_participants"),
0,
),
0,
),
),
default=None,
)
)
Update: I minimized the difference in GROUP BY like this:
I set queue_position = AnnotationProperty(Count("event__eventregistration") - F("event__max_participants"))
to simplify the query as much as possible while keeping the difference in the GROUP BY
between the normal and .exists() queries. Here's the result:
-- EventRegistration.objects.select_properties("queue_position").filter(queue_position__isnull=True).values_list("pk").query.get_compiler(using="default").as_sql()
SELECT "events_eventregistration"."id"
FROM "events_eventregistration"
INNER JOIN "events_event" ON (
"events_eventregistration"."event_id" = "events_event"."id"
)
LEFT OUTER JOIN "events_eventregistration" T3 ON ("events_event"."id" = T3."event_id")
GROUP BY "events_event"."max_participants",
"events_eventregistration"."id"
HAVING (
COUNT(T3."id") - "events_event"."max_participants"
) IS NULL;
-- EventRegistration.objects.select_properties("queue_position").filter(queue_position__isnull=True).values_list("pk").query.exists().get_compiler(using="default").as_sql()
SELECT %s AS "a"
FROM "events_eventregistration"
INNER JOIN "events_event" ON (
"events_eventregistration"."event_id" = "events_event"."id"
)
LEFT OUTER JOIN "events_eventregistration" T3 ON ("events_event"."id" = T3."event_id")
GROUP BY "events_event"."max_participants"
HAVING (
COUNT(T3."id") - "events_event"."max_participants"
) IS NULL
LIMIT 1;
I'll look into how that query is built next.
I still didn't have the time to dive deeper into this, but at a glance I can kind of understand the difference in the GROUP BY
clauses as my guess would be that Django only adds the id
column if it's actually selected since any column in a grouped query must either be part of the GROUP BY
clause or an aggregate. Since no real fields are selected in the EXISTS
query, Django probably also doesn't add them to the GROUP BY
clause.
As an aside: the problem being related to aggregate queries and fixed by adding a grouping key does remind me of this known "issue", but that still wouldn't explain the difference between the regular and the EXISTS
query.
One thing you could try with your setup is to perform the same queries without django-queryable-properties, i.e. do not use a QueryablePropertiesManager
/QueryablePropertiesQuerySet
so that there's no django-queryable-properties code involved and add the annotations represented by the properties the regular way using .annotate()
or .alias()
. If that yields the same results, then the issue would be Django's construction of EXISTS
queries in such cases. If not, then django-queryable-properties really is the problem and I'm going to have to find the underlying cause (which would be interesting since all the library does is somewhat intelligently performing the same thing .annotate()
does automatically, without any specific logic regarding EXISTS
queries).
The GROUP BY field is missing because: SQLCompiler.get_group_by()
determines the grouping fields, based on Query.group_by
(this is empty for my case), what's needed for Query.select
(which causes the eventregistration id to be added in the normal query, but not in the .exists() one), and what's needed for Query.having
(which turns out to only add GROUP BY "events_event"."max_participants"
).
I find it hard to say where it's going wrong. It could be seen as both:
Query.has_results()
: one could argue that function is responsible for adding to Query.group_by
to make up for removing from Query.select
.SQLCompiler.get_group_by()
or perhaps WhereNode.get_group_by_cols()
: you could also say one of these functions needs to be aware of the fact that they're being used with a HAVING clause.In any case, it seems like the problem is not actually on django-queryable-properties' side. I can reproduce the difference between the queries without queryable-properties easily like this:
>>> EventRegistration.objects.annotate(queue_position=Count("event__eventregistration") - F("event__max_participants")).filter(queue_position__isnull=True).values_list("pk").query.exists().get_compiler(using="default").as_sql()
('SELECT %s AS "a" FROM "events_eventregistration" INNER JOIN "events_event" ON ("events_eventregistration"."event_id" = "events_event"."id") LEFT OUTER JOIN "events_eventregistration" T3 ON ("events_event"."id" = T3."event_id") GROUP BY "events_event"."max_participants" HAVING (COUNT(T3."id") - "events_event"."max_participants") IS NULL LIMIT 1', (1,))
>>> EventRegistration.objects.annotate(queue_position=Count("event__eventregistration") - F("event__max_participants")).filter(queue_position__isnull=True).values_list("pk").query.get_compiler(using="default").as_sql()
('SELECT "events_eventregistration"."id" FROM "events_eventregistration" INNER JOIN "events_event" ON ("events_eventregistration"."event_id" = "events_event"."id") LEFT OUTER JOIN "events_eventregistration" T3 ON ("events_event"."id" = T3."event_id") GROUP BY "events_event"."max_participants", "events_eventregistration"."id" HAVING (COUNT(T3."id") - "events_event"."max_participants") IS NULL', ())
I'll make a cleaner reproduction like that and create a ticket on Django. I guess it's alright to close the issue here.
I'm going to close this as you suggested. If there's any development that indicates that a bug in django-queryable-properties is involved here, let me know and I can re-open the issue.
I encountered the following bug. I'm not sure if it's related to django-queryable-properties but it seems likely to me:
Clearly,
qs.exists()
is meant to be semantically equivalent toqs.count() > 0
, so this was quite a surprise. Removing any of the 4 filters above solved the problem (count becomes greater, exists becomes true).Here are the queries that end up being used:
And the model definition (many fields left out):
There's a simple workaround (to just use the
.count()
instead) but I would love to find out why this goes wrong. I will try to reproduce this in a simpler setting once I find some time for it.