dimagi / django-cte

Common Table Expressions (CTE) for Django
Other
334 stars 46 forks source link

Can't use CTE within a subquery? #15

Closed SebCorbin closed 4 years ago

SebCorbin commented 4 years ago

ValueError: This queryset contains a reference to an outer query and may only be used in a subquery.

I have a pretty complex query, but thanks to your package, I've managed to make it an ORM based one (which is useful for annotation).

But I've got a problem: I'm wondering why I can't use CTE in a subquery, here's the query I'm trying to achieve (yes, it's complex, and I've already simplified it):

SELECT V0."id"
FROM "plant" V0
WHERE (CASE
           WHEN (V0."plantation_date" > 2020 - 03 - 30
               OR (V0."death_date" IS NOT NULL
                   AND V0."death_date" <= 2020 - 03 - 30))
               THEN -1
           WHEN (
                   (WITH RECURSIVE cte AS
                                       (SELECT "observation"."plant_id",
                                               "observation"."date",
                                               "observation"."emission",
                                               LEAD("observation"."emission", 1)
                                               OVER (ORDER BY "observation"."date" DESC) AS "prev_emission"
                                        FROM "observation"
                                        WHERE "observation"."date" <= 2020 - 03 - 30
                                               AND "observation".plant_id = V0.plant_id
                                        ORDER BY "observation"."date" DESC)
                    SELECT CASE
                               WHEN (LEAD(U0."date", 1)
                                     OVER (ORDER BY U0."date" DESC) IS NOT NULL
                                   AND EXTRACT(
                                               'year' FROM
                                               LEAD(U0."date", 1)
                                               OVER (ORDER BY U0."date" DESC)
                                           ) =
                                       EXTRACT('year' FROM U0."date"))
                                   THEN EXTRACT('year' FROM U0."date")
                               ELSE NULL
                               END AS "last_emissions_year"
                    FROM "cte" U0
                    WHERE U0."emission" IS DISTINCT
                        FROM U0."prev_emission"
                    ORDER BY U0."date" DESC
                    LIMIT 1) = 2020
               ) THEN 2
           ELSE 0
           END = 2
          )

Which translates as

at_date = date.today()

cte = With(
    Observation.objects.filter(
        date__lte=at_date,
    ).annotate(
        prev_emission=Window(
            Lead('emission'),
            order_by=F('date').desc()
        )
    ).order_by('-date').values('plant_id', 'date', 'emission', 'prev_emission')
)
qs = cte.queryset().with_cte(cte).annotate(
    other_field=Window(Lead('date'), order_by=F('date').desc()),
    last_emissions_year=Case(
        When(
            Q(other_field__isnull=False) &
            Q(other_field__year=ExtractYear('date')),
            then=ExtractYear('date')
        ),
    ),
).filter(
    emission__isdistinctfrom=F('prev_emission')
).order_by('-date').values('last_emissions_year')

Plant.objects.annotate(
    last_emissions_year=Subquery(qs[:1]),
    plant_state=Case(
        When(
            Q(plantation_date__gt=at_date) | (
                    Q(death_date__isnull=False) &
                    Q(death_date__lte=at_date)
            ),
            then=STATE_INACTIVE,
        ),
        When(
            Q(
                last_emissions_year__isnull=False,
                last_emissions_year=at_date.year,
            ),
            then=STATE_EMISSION_END,
        ),
        default=STATE_NO_EMISSION,
        output_field=IntegerField(),
    ),
).filter(plant_state=STATE_EMISSION_END)

The one above works fine but is missing AND "observation".plant_id = V0.plant_id in the CTE, and when I add plant_id=OuterRef('pk') to the CTE

cte = With(
    Observation.objects.filter(
        plant_id=OuterRef('pk'),
        date__lte=at_date,
    ).annotate(
        prev_emission=Window(
            Lead('emission'),
            order_by=F('date').desc()
        )
    ).order_by('-date').values('plant_id', 'date', 'emission', 'prev_emission')
)

I get the error mentioned, any ideas?

millerdev commented 4 years ago

@SebCorbin there are tests that use Subquery in conjunction with OuterRef, so that is a hopeful sign.

Perhaps try attaching the extra where clause element directly on the subquery, outside of the CTE

-    last_emissions_year=Subquery(qs[:1]),
+    last_emissions_year=Subquery(qs.filter(plant_id=OuterRef('pk'))[:1]),

Maybe that will change the meaning of your query in a bad way?

It's pretty hard for me to get a clear picture of what's going wrong without being able to run some code. Could you to write a failing test for django-cte or at least a query using the test models so I can reproduce the error?

Also, what version of Django are you using?

SebCorbin commented 4 years ago

Yes, attaching the filter clause is modifying the query in a bad way. I need the reference inside the CTE, and after having a look a the tests, the OuterRef is always used outside the With() object.

I've tried adding a test with the PR #16 but I'm not comfortable this is a real query (and of course it could be done way simpler).

My version of Django is 3.0.5.