W1ldPo1nter / django-queryable-properties

Write Django model properties that can be used in database queries.
BSD 3-Clause "New" or "Revised" License
72 stars 1 forks source link

Queryset.exists() == False but queryset.count() > 0 #13

Closed DeD1rk closed 10 months ago

DeD1rk commented 10 months ago

I encountered the following bug. I'm not sure if it's related to django-queryable-properties but it seems likely to me:

>>> event_registrations = EventRegistration.objects.select_properties(
...     "queue_position", "payment_amount",
... ).filter(
...     event__start__date__gte="2023-09-01",
...     date_cancelled__isnull=True,
...     queue_position__isnull=True,
...     payment_amount__gt=0,
... )
>>> event_registrations.exists()
False
>>> event_registrations.count()
407
>>> event_registrations.exists()
False

Clearly, qs.exists() is meant to be semantically equivalent to qs.count() > 0, so this was quite a surprise. Removing any of the 4 filters above solved the problem (count becomes greater, exists becomes true).

Here are the queries that end up being used:

>>> event_registrations.query.exists().get_compiler(using="default").as_sql()
('SELECT %s AS "a" FROM "events_eventregistration" INNER JOIN "events_event" ON ("events_eventregistration"."event_id" = "events_event"."id") LEFT OUTER JOIN "events_eventregistration" T3 ON ("events_event"."id" = T3."event_id") WHERE ("events_eventregistration"."date_cancelled" IS NULL AND ("events_event"."start" AT TIME ZONE %s)::date >= %s AND CASE WHEN ("events_eventregistration"."special_price" IS NOT NULL) THEN "events_eventregistration"."special_price" ELSE "events_event"."price" END > %s) GROUP BY "events_eventregistration"."date_cancelled", "events_event"."max_participants", CASE WHEN "events_eventregistration"."special_price" IS NOT NULL THEN "events_eventregistration"."special_price" ELSE "events_event"."price" END HAVING CASE WHEN ("events_eventregistration"."date_cancelled" IS NULL) THEN NULLIF(GREATEST((COUNT(T3."id") FILTER (WHERE (T3."date_cancelled" IS NULL AND (T3."date" < ("events_eventregistration"."date") OR (T3."id" <= ("events_eventregistration"."id") AND T3."date" = ("events_eventregistration"."date"))))) - "events_event"."max_participants"), %s), %s) ELSE NULL END IS NULL LIMIT 1', (1, 'Europe/Amsterdam', datetime.date(2023, 9, 1), Decimal('0'), 0, 0))
>>> event_registrations.query.get_compiler(using="default").as_sql()
('SELECT "events_eventregistration"."id", "events_eventregistration"."event_id", "events_eventregistration"."member_id", "events_eventregistration"."name", "events_eventregistration"."alt_email", "events_eventregistration"."alt_phone_number", "events_eventregistration"."date", "events_eventregistration"."date_cancelled", "events_eventregistration"."present", "events_eventregistration"."special_price", "events_eventregistration"."remarks", "events_eventregistration"."payment_id", CASE WHEN "events_eventregistration"."date_cancelled" IS NULL THEN NULLIF(GREATEST((COUNT(T3."id") FILTER (WHERE (T3."date_cancelled" IS NULL AND (T3."date" < ("events_eventregistration"."date") OR (T3."id" <= ("events_eventregistration"."id") AND T3."date" = ("events_eventregistration"."date"))))) - "events_event"."max_participants"), %s), %s) ELSE NULL END AS "queue_position", CASE WHEN "events_eventregistration"."special_price" IS NOT NULL THEN "events_eventregistration"."special_price" ELSE "events_event"."price" END AS "payment_amount" FROM "events_eventregistration" INNER JOIN "events_event" ON ("events_eventregistration"."event_id" = "events_event"."id") LEFT OUTER JOIN "events_eventregistration" T3 ON ("events_event"."id" = T3."event_id") WHERE ("events_eventregistration"."date_cancelled" IS NULL AND ("events_event"."start" AT TIME ZONE %s)::date >= %s AND CASE WHEN ("events_eventregistration"."special_price" IS NOT NULL) THEN "events_eventregistration"."special_price" ELSE "events_event"."price" END > %s) GROUP BY "events_eventregistration"."date_cancelled", "events_event"."max_participants", 14, "events_eventregistration"."id" HAVING CASE WHEN ("events_eventregistration"."date_cancelled" IS NULL) THEN NULLIF(GREATEST((COUNT(T3."id") FILTER (WHERE (T3."date_cancelled" IS NULL AND (T3."date" < ("events_eventregistration"."date") OR (T3."id" <= ("events_eventregistration"."id") AND T3."date" = ("events_eventregistration"."date"))))) - "events_event"."max_participants"), %s), %s) ELSE NULL END IS NULL', (0, 0, 'Europe/Amsterdam', datetime.date(2023, 9, 1), Decimal('0'), 0, 0))

And the model definition (many fields left out):


class EventRegistration(models.Model):
    objects = QueryablePropertiesManager()

    event = models.ForeignKey(Event, models.CASCADE)

    ...

    date = models.DateTimeField(_("registration date"), default=timezone.now)
    date_cancelled = models.DateTimeField(_("cancellation date"), null=True, blank=True)

    special_price = PaymentAmountField(
        verbose_name=_("special price"),
        blank=True,
        null=True,
        validators=[validators.MinValueValidator(0)],
    )

    payment = models.OneToOneField(
        "payments.Payment",
        related_name="events_registration",
        on_delete=models.SET_NULL,
        blank=True,
        null=True,
    )

    queue_position = AnnotationProperty(
        Case(
            # Get queue position by counting amount of registrations with lower date and in case of same date lower id
            # Subsequently cast to None if this is 0 or lower, in which case it isn't in the queue
            # If the current registration is cancelled, also force it to None.
            When(
                date_cancelled=None,
                then=NullIf(
                    Greatest(
                        Count(
                            "event__eventregistration",
                            filter=Q(event__eventregistration__date_cancelled=None)
                            & (
                                Q(event__eventregistration__date__lt=F("date"))
                                | Q(event__eventregistration__id__lte=F("id"))
                                & Q(event__eventregistration__date__exact=F("date"))
                            ),
                        )
                        - F("event__max_participants"),
                        0,
                    ),
                    0,
                ),
            ),
            default=None,
        )
    )

    @queryable_property
    def payment_amount(self):
        return self.event.price if not self.special_price else self.special_price

    @payment_amount.annotater
    @classmethod
    def payment_amount(cls):
        return Case(
            When(Q(special_price__isnull=False), then=F("special_price")),
            default=F("event__price"),
        )

There's a simple workaround (to just use the .count() instead) but I would love to find out why this goes wrong. I will try to reproduce this in a simpler setting once I find some time for it.

W1ldPo1nter commented 10 months ago

Sorry for the late response, but it's the Christmas holidays after all.

This definitely looks weird and I can't really tell how or why it happens yet either. I'm also going to try and reproduce the issue in the test setup to get a better look at it (thanks for the example). Though it might take a bit as it's still the holidays. If you manage to reproduce the issue in a simpler setup, then I'd be happy to see that example as well.

DeD1rk commented 10 months ago

Thanks, yes I'll try to find out more. I'm thinking it should be pretty straightforward to find out how it works from the SQL statements. I'll see if I can strip them down further. Happy holidays!

DeD1rk commented 10 months ago

I don't have easier example data yet, but I tracked down what I think causes the issue.

Here's a diff between (essentially; I removed all of the fields with .values_list() and filled in the constants, the issue remained the same) the normal correct query and the .exists() one:

image
See here the formatted queries in text:

```sql SELECT '1' AS "a" FROM "events_eventregistration" INNER JOIN "events_event" ON ( "events_eventregistration"."event_id" = "events_event"."id" ) LEFT OUTER JOIN "events_eventregistration" T3 ON ("events_event"."id" = T3."event_id") WHERE ( "events_eventregistration"."date_cancelled" IS NULL AND ( "events_event"."start" AT TIME ZONE 'Europe/Amsterdam' )::date >= '2023-09-01' AND CASE WHEN ( "events_eventregistration"."special_price" IS NOT NULL ) THEN "events_eventregistration"."special_price" ELSE "events_event"."price" END > 0 ) GROUP BY "events_eventregistration"."date_cancelled", "events_event"."max_participants", CASE WHEN "events_eventregistration"."special_price" IS NOT NULL THEN "events_eventregistration"."special_price" ELSE "events_event"."price" END HAVING CASE WHEN ( "events_eventregistration"."date_cancelled" IS NULL ) THEN NULLIF( GREATEST( ( COUNT(T3."id") FILTER ( WHERE ( T3."date_cancelled" IS NULL AND ( T3."date" < ("events_eventregistration"."date") OR ( T3."id" <= ("events_eventregistration"."id") AND T3."date" = ("events_eventregistration"."date") ) ) ) ) - "events_event"."max_participants" ), 0 ), 0 ) ELSE NULL END IS NULL LIMIT 1 ``` ```sql SELECT "events_eventregistration"."id" FROM "events_eventregistration" INNER JOIN "events_event" ON ( "events_eventregistration"."event_id" = "events_event"."id" ) LEFT OUTER JOIN "events_eventregistration" T3 ON ("events_event"."id" = T3."event_id") WHERE ( "events_eventregistration"."date_cancelled" IS NULL AND ( "events_event"."start" AT TIME ZONE 'Europe/Amsterdam' )::date >= '2023-09-01' AND CASE WHEN ( "events_eventregistration"."special_price" IS NOT NULL ) THEN "events_eventregistration"."special_price" ELSE "events_event"."price" END > 0 ) GROUP BY "events_eventregistration"."date_cancelled", "events_event"."max_participants", CASE WHEN "events_eventregistration"."special_price" IS NOT NULL THEN "events_eventregistration"."special_price" ELSE "events_event"."price" END, "events_eventregistration"."id" HAVING CASE WHEN ( "events_eventregistration"."date_cancelled" IS NULL ) THEN NULLIF( GREATEST( ( COUNT(T3."id") FILTER ( WHERE ( T3."date_cancelled" IS NULL AND ( T3."date" < ("events_eventregistration"."date") OR ( T3."id" <= ("events_eventregistration"."id") AND T3."date" = ("events_eventregistration"."date") ) ) ) ) - "events_event"."max_participants" ), 0 ), 0 ) ELSE NULL END IS NULL ```

The difference in the GROUP BY clause is surprising. So I tried adding the "events_eventregistration"."id" grouping key into the .exists() query, and indeed that solves the problem.

So I'm thinking it has to have something to do with the | Q(event__eventregistration__id__lte=F("id")) part in

    queue_position = AnnotationProperty(
        Case(
            # Get queue position by counting amount of registrations with lower date and in case of same date lower id
            # Subsequently cast to None if this is 0 or lower, in which case it isn't in the queue
            # If the current registration is cancelled, also force it to None.
            When(
                date_cancelled=None,
                then=NullIf(
                    Greatest(
                        Count(
                            "event__eventregistration",
                            filter=Q(event__eventregistration__date_cancelled=None)
                            & (
                                Q(event__eventregistration__date__lt=F("date"))
                                | Q(event__eventregistration__id__lte=F("id"))
                                & Q(event__eventregistration__date__exact=F("date"))
                            ),
                        )
                        - F("event__max_participants"),
                        0,
                    ),
                    0,
                ),
            ),
            default=None,
        )
    )
DeD1rk commented 10 months ago

Update: I minimized the difference in GROUP BY like this:

I set queue_position = AnnotationProperty(Count("event__eventregistration") - F("event__max_participants")) to simplify the query as much as possible while keeping the difference in the GROUP BY between the normal and .exists() queries. Here's the result:


-- EventRegistration.objects.select_properties("queue_position").filter(queue_position__isnull=True).values_list("pk").query.get_compiler(using="default").as_sql()
SELECT "events_eventregistration"."id"
FROM "events_eventregistration"
    INNER JOIN "events_event" ON (
        "events_eventregistration"."event_id" = "events_event"."id"
    )
    LEFT OUTER JOIN "events_eventregistration" T3 ON ("events_event"."id" = T3."event_id")
GROUP BY "events_event"."max_participants",
    "events_eventregistration"."id"
HAVING (
        COUNT(T3."id") - "events_event"."max_participants"
    ) IS NULL;

-- EventRegistration.objects.select_properties("queue_position").filter(queue_position__isnull=True).values_list("pk").query.exists().get_compiler(using="default").as_sql()
SELECT %s AS "a"
FROM "events_eventregistration"
    INNER JOIN "events_event" ON (
        "events_eventregistration"."event_id" = "events_event"."id"
    )
    LEFT OUTER JOIN "events_eventregistration" T3 ON ("events_event"."id" = T3."event_id")
GROUP BY "events_event"."max_participants"
HAVING (
        COUNT(T3."id") - "events_event"."max_participants"
    ) IS NULL
LIMIT 1;

I'll look into how that query is built next.

W1ldPo1nter commented 10 months ago

I still didn't have the time to dive deeper into this, but at a glance I can kind of understand the difference in the GROUP BY clauses as my guess would be that Django only adds the id column if it's actually selected since any column in a grouped query must either be part of the GROUP BY clause or an aggregate. Since no real fields are selected in the EXISTS query, Django probably also doesn't add them to the GROUP BY clause.

As an aside: the problem being related to aggregate queries and fixed by adding a grouping key does remind me of this known "issue", but that still wouldn't explain the difference between the regular and the EXISTS query.

One thing you could try with your setup is to perform the same queries without django-queryable-properties, i.e. do not use a QueryablePropertiesManager/QueryablePropertiesQuerySet so that there's no django-queryable-properties code involved and add the annotations represented by the properties the regular way using .annotate() or .alias(). If that yields the same results, then the issue would be Django's construction of EXISTS queries in such cases. If not, then django-queryable-properties really is the problem and I'm going to have to find the underlying cause (which would be interesting since all the library does is somewhat intelligently performing the same thing .annotate() does automatically, without any specific logic regarding EXISTS queries).

DeD1rk commented 10 months ago

The GROUP BY field is missing because: SQLCompiler.get_group_by() determines the grouping fields, based on Query.group_by (this is empty for my case), what's needed for Query.select (which causes the eventregistration id to be added in the normal query, but not in the .exists() one), and what's needed for Query.having (which turns out to only add GROUP BY "events_event"."max_participants").

I find it hard to say where it's going wrong. It could be seen as both:

In any case, it seems like the problem is not actually on django-queryable-properties' side. I can reproduce the difference between the queries without queryable-properties easily like this:

>>> EventRegistration.objects.annotate(queue_position=Count("event__eventregistration") - F("event__max_participants")).filter(queue_position__isnull=True).values_list("pk").query.exists().get_compiler(using="default").as_sql()
('SELECT %s AS "a" FROM "events_eventregistration" INNER JOIN "events_event" ON ("events_eventregistration"."event_id" = "events_event"."id") LEFT OUTER JOIN "events_eventregistration" T3 ON ("events_event"."id" = T3."event_id") GROUP BY "events_event"."max_participants" HAVING (COUNT(T3."id") - "events_event"."max_participants") IS NULL LIMIT 1', (1,))

>>> EventRegistration.objects.annotate(queue_position=Count("event__eventregistration") - F("event__max_participants")).filter(queue_position__isnull=True).values_list("pk").query.get_compiler(using="default").as_sql()
('SELECT "events_eventregistration"."id" FROM "events_eventregistration" INNER JOIN "events_event" ON ("events_eventregistration"."event_id" = "events_event"."id") LEFT OUTER JOIN "events_eventregistration" T3 ON ("events_event"."id" = T3."event_id") GROUP BY "events_event"."max_participants", "events_eventregistration"."id" HAVING (COUNT(T3."id") - "events_event"."max_participants") IS NULL', ())

I'll make a cleaner reproduction like that and create a ticket on Django. I guess it's alright to close the issue here.

W1ldPo1nter commented 10 months ago

I'm going to close this as you suggested. If there's any development that indicates that a bug in django-queryable-properties is involved here, let me know and I can re-open the issue.