Closed ShaheedHaque closed 3 years ago
I forgot to add...
I am using django-debug-toolbar to see what queries are used in the view. I can also use django-debug-toolbar's CLI to construct and run queries by hand. Using the CLI, I can see that the number and content of roundtrips does not seem to explode:
$ ./manage.py debugsqlshell
...
In [4]: qs = PayRun.objects.filter(company__legal_name='Advanced Mechanics').defer('snapshot')
In [5]: prefetch_queryable_properties(qs, 'last_run')
SELECT t.oid,
typarray
FROM pg_type t
JOIN pg_namespace ns ON typnamespace = ns.oid
WHERE typname = 'hstore' [0.83ms]
SELECT typarray
FROM pg_type
WHERE typname = 'citext' [0.16ms]
SELECT "viewflow_process"."id",
"viewflow_process"."flow_class",
"viewflow_process"."status",
"viewflow_process"."created",
"viewflow_process"."finished",
"viewflow_process"."artifact_content_type_id",
"viewflow_process"."artifact_object_id",
"viewflow_process"."data",
"paiyroll_payrun"."process_ptr_id",
"paiyroll_payrun"."schedule_id",
"paiyroll_payrun"."custom",
"paiyroll_payrun"."on_demand",
"paiyroll_payrun"."company_id",
"paiyroll_payrun"."frequency",
"paiyroll_payrun"."approved",
"paiyroll_payrun"."target_t",
"paiyroll_payrun"."actual_t",
"paiyroll_payrun"."next_t",
"paiyroll_payrun"."result",
"paiyroll_payrun"."session_id"
FROM "paiyroll_payrun"
INNER JOIN "paiyroll_company" ON ("paiyroll_payrun"."company_id" = "paiyroll_company"."id")
INNER JOIN "viewflow_process" ON ("paiyroll_payrun"."process_ptr_id" = "viewflow_process"."id")
WHERE "paiyroll_company"."legal_name" = 'Advanced Mechanics'
ORDER BY "viewflow_process"."created" DESC [1.96ms]
SELECT "paiyroll_payrun"."process_ptr_id",
COALESCE(to_char(
(SELECT U0."created"
FROM "viewflow_task" U0
INNER JOIN "viewflow_process" U1 ON (U0."process_id" = U1."id")
INNER JOIN "paiyroll_payrun" U2 ON (U1."id" = U2."process_ptr_id")
WHERE (U0."flow_task" = 'paiyroll/flows.payrun.PayRunFlow.run_payrun'
AND U2."process_ptr_id" = "paiyroll_payrun"."process_ptr_id")
ORDER BY U0."created" DESC
LIMIT 1), 'YYYY-MM-DD HH24:MI:ss'), '') AS "last_run"
FROM "paiyroll_payrun"
INNER JOIN "viewflow_process" ON ("paiyroll_payrun"."process_ptr_id" = "viewflow_process"."id")
WHERE "paiyroll_payrun"."process_ptr_id" IN (129,
132,
...
383)
ORDER BY "viewflow_process"."created" DESC [4.84ms]
and the query then runs no extra round trips:
In [6]: for o in qs:
...: print(o.frequency,o.last_run) <<<<<<<<<<<<<<<< frequency is a normal CharField, and
<<<<<<<<<<<<<<<< last_run is a queryable_property
...:
m1 2021-08-12 09:21:26
m1 2021-08-05 15:03:26
...
m1 2019-04-07 11:49:56
whereas in the GUI case, django-debug-toolbar shows about 25 extra queries for the frequency
field, one for each instance of the pay run:
SELECT "viewflow_process"."id",
"viewflow_process"."flow_class",
"viewflow_process"."status",
"viewflow_process"."created",
"viewflow_process"."finished",
"viewflow_process"."artifact_content_type_id",
"viewflow_process"."artifact_object_id",
"viewflow_process"."data",
"paiyroll_payrun"."process_ptr_id",
("paiyroll_payrun"."snapshot" -> 'company') AS "snapshot_company",
("paiyroll_payrun"."snapshot" -> 'pay_run') AS "snapshot_pay_run",
("paiyroll_payrun"."snapshot" -> 'report_definition') AS "snapshot_report_definition",
("paiyroll_payrun"."snapshot" -> 'report_template') AS "snapshot_report_template",
("paiyroll_payrun"."snapshot" -> 'code_version') AS "snapshot_code_version"
FROM "paiyroll_payrun"
INNER JOIN "paiyroll_company"
ON ("paiyroll_payrun"."company_id" = "paiyroll_company"."id")
INNER JOIN "viewflow_process"
ON ("paiyroll_payrun"."process_ptr_id" = "viewflow_process"."id")
WHERE ("paiyroll_payrun"."company_id" = 173 AND "paiyroll_company"."client_id" = 52 AND "paiyroll_payrun"."actual_t" >= '2021-04-06'::date AND "paiyroll_payrun"."actual_t" <= '2022-04-05'::date)
ORDER BY "paiyroll_payrun"."actual_t" DESC, "viewflow_process"."created" DESC
LIMIT 25
...
... This real query has several more queryable_properties, and is generally more complex, details not shown for brevity
...
SELECT "paiyroll_payrun"."process_ptr_id",
"paiyroll_payrun"."frequency"
FROM "paiyroll_payrun"
WHERE "paiyroll_payrun"."process_ptr_id" = 24073
LIMIT 21 25 similar queries.
...
SELECT "paiyroll_payrun"."process_ptr_id",
"paiyroll_payrun"."frequency"
FROM "paiyroll_payrun"
WHERE "paiyroll_payrun"."process_ptr_id" = 24288
LIMIT 21 25 similar queries.
Hey, thanks for the report.
First off: the code you've shown in the second post isn't actually that good of a use case for prefetch_queryable_properties
. This utility is intended to be used whenever you are not already performing a query for the objects, but getting them from somewhere else, e.g. if you were loading your PayRun
instances from a cache instead of querying them from the DB directly. That's why the debug SQL shows two queries: one for the objects themselves and one for the property values. You could do both things in a single query by simply using .select_properties(...)
as described here. In your case, the queryset could be constructed like this:
PayRun.objects.filter(company__legal_name='Advanced Mechanics').defer('snapshot').select_properties('last_run')
Besides that, I'm somewhat confused due to everything seemingly working correctly in your debug shell, which makes me think that the problem may somehow be related to your view itself and unrelated to queryable properties, but I obviously can't say for certain. Are there other queryset operations involved beside the .filter(...)
and the .defer(...)
calls shown in the example? Are other queryable properties being queried, maybe even properties of a related model? I would like to try and reproduce the problem using the test setup.
Could you also tell me which Django version you are using?
It's probably worth saying that I'm teetering on the edge of my comfort zone with the ORM generally and queryable_properties is somewhat dragging me over the edge, so it is quite possible I am being a noob.
I do however see your point abut the second query. Note also that I am only really using prefetch_queryable_properties()
even in the case of the first query because I am retrofitting this to the existing stack of mixins and so on that generate the query. It let's me conveniently "post process" my abstract query representation to do the prefetch, rather than rework the stack with "native" support for .select_properties()
.
The Django version is 3.2.7.
The query is, as I hinted...complicated. I'm not sure what is relevant, but:
PayRun
is a subclass of viewflow.Process
, which has an implementation of polymorphism across its subclasses, and which affects (or rather may affect) the querysetRoughly speaking, the query should be doing something like this:
PayRun.objects. \
filter(company__somestuff='xxx', a_date__gt='xxx', a_date__lte='xxx'). \ # A company, for a given date range.
defer('snapshot'). \ # Don't fetch huge JSON field.
annotate(...a few of my own...). \ # Some extracts of the huge JSON field.
annotate(...whatever magic queryable_properties does...) # 5 separate queryable_properties,
# one of which joins a Task model
# to the Process model.
P.S. I have been trying to debug things. I am currently in QueryablePropertiesIterable.__iter__()
trying to see why the list of columns to be fetched seems to be being truncated to just the fields in Process
, and dropping the extra fields of the PayRun
.
OK, the fields from PayRun were being dropped from the query because in my ignorance of the ORM's .defer() capability, one of my mixins which was dealing with the polymorphic aspect of using Viewflow's Process was explicitly killing the missing fields.
Apologies for wasting your time, and thanks for your quick reply in any event.
No worries, I didn't even invest that much time into it yet (wanted to play around with the inheritance + .defer()
+ .filter()
combination later today myself). I'm glad that it's working out for you and that there's no bug here.
I recently added usage of version 1.7.0 of this library to implement some features, with excellent results. However, I recently noticed that access to normal model fields (i.e. based on
django.db.models.Field
notQueryablePropertyDescriptor
) was behaving as if each of the normal fields had hadqueryset.defer('field_name')
applied to it. This causes an explosion in the number of round trips by roughlynumber-of-fields * queryset-instances
, and thus terrible performance at even moderate scale.Of course, I only noticed this when it got real data to play with, and long after merging, so any help would be most welcome :-).
Here are some details of a case that might be relevant:
self.get_queryset()
does.defer('snapshot')
because that is a very large JSON field, though it also uses.annotate()
to fetch a subset of the data insnapshot
.prefetch_queryable_properties()
to reduce the number of round trips they incur.Posting all the relevant code is rather impractical, but I am of course happy to provide any details (or debug anything) which might be helpful.