Closed rsomani95 closed 6 months ago
The workaround is to always query for at least one attr, and only include totalCount
and edgeCount
at the bottom. Specifically, something like:
{
customObjects {
edges {
node {
id
}
}
totalCount
}
}
We use total_count
as well, but have always included it after edges, so we never noticed. We've also noticed that the filter processing doesn't account for fragments, unlike the optimization compiler, so there is some improvement I need to do for that as well. Will start working on this next.
Fixed this and another bug I found related to selecting cursor
before node
in edges (caused optimization to stop prematurely), and added support for finding filters from fragments. Released in 0.2.5.
@MrThearMan I believe there's still a bug with this on nested relay fields.
With this setup given the latest 0.2.5
release:
class PropertyManagerNode(IsTypeOfProxyPatch, DjangoObjectType):
housing_companies = DjangoConnectionField(HousingCompanyNode)
class Meta:
model = PropertyManagerProxy
interfaces = (relay.Node,)
filterset_class = PropertyManagerFilterSet
connection_class = CustomConnection
class HousingCompanyNode(IsTypeOfProxyPatch, DjangoObjectType):
real_estates = DjangoConnectionField(RealEstateNode)
class Meta:
model = HousingCompanyProxy
interfaces = (relay.Node,)
filterset_class = HousingCompanyFilterSet
connection_class = CustomConnection
The totalCount
is incorrect for nested fields. Here's how I tested this. First, I run this query:
{
pagedPropertyManagers {
totalCount
edgeCount
edges {
node {
id
housingCompanies {
totalCount
edgeCount
edges {
node {
id
}
}
}
}
}
}
}
For which, here are the first node's results:
{
"data": {
"pagedPropertyManagers": {
"totalCount": 10,
"edgeCount": 10,
"edges": [
{
"node": {
"id": "UHJvcGVydHlNYW5hZ2VyTm9kZTo1",
"housingCompanies": {
"totalCount": 2,
"edgeCount": 2,
"edges": [
{
"node": {
"id": "SG91c2luZ0NvbXBhbnlOb2RlOjU="
}
},
{
"node": {
"id": "SG91c2luZ0NvbXBhbnlOb2RlOjE1"
}
}
]
}
}
},
...
Now, if I change the query to:
{
pagedPropertyManagers {
totalCount
edgeCount
edges {
node {
id
housingCompanies(first: 1) {
totalCount
edgeCount
edges {
node {
id
}
}
}
}
}
}
}
The totalCount
on the housingCompanies
node should still show 2
(as that is indeed the total count as seen above), but it shows as 1
, which is the edge count. This is not a filtering issue. In my own project, I have totalCount
on my nested fields that are greater than the relay max page size, but the displayed totalCount
is the same as the edgeCount
.
This is related to how nested relay fields are limited. In the connection field, we do queryset.count()
before limiting, but for prefetched rows, the windowing function is applied to the queryset during optimization, and when we get to that objects connection field, queryset.count()
will use the result_cache
, which is now same as edge count. We would need to annotate the pre-windowed count to each row and pick that out if it exists in the connection field. I'll see if that's doable.
Ok, I managed to add a way to annotate the pre-windowed queryset count to the models in the prefetched querysets. The connection field will use this value if it's found instead of queryset.count()
.
One slight annoyance in the implementation was that I needed to use a subquery, since a regular Count aggregate would be limited by the window function. Also, we only need the value once, but the prefetched queryset will contain it for all returned models (and we then use it from the first one). At most, the subquery happens for RELAY_CONNECTION_MAX_LIMIT
number of rows, and I'm not sure if or how databases will optimize this. In any case, we cannot add a queryset.count()
outside of the prefetch, since that would result in N additional queries. Hopefully this is optimal enough. Released in 0.2.6.
Thanks, that's interesting. I was able to verify that this works. Worth noting that though the no. of SQL queries are the same, the SQL time before the fix was ~40ms, where after is ~200ms. But as you say, this is just the cost of work being done.
Worth noting: the query takes longer even if totalCount
is not being queried for (maybe that's scope for future optimisation?)
Sharing the below for completeness / reference:
I'm using a postgresql DB, and the here are the queries generated before vs. after:
Before:
SELECT "col1",
"col2"
FROM (SELECT *
FROM (SELECT "segment"."id" AS
"col1",
"segment"."asset" AS
"col2",
Row_number()
over (
PARTITION BY "segment"."asset") AS
"qual0"
FROM "segment"
WHERE "segment"."asset" IN (
'194eec67-9d09-442e-892a-285f6a0a28ac' :: uuid,
'2d0718e5-3667-4e4d-9252-c15857c0c593' :: uuid,
...
'dd7787b2-7e6d-43c4-a729-56101c8d0fee' :: uuid,
'930dfa25-9be5-448d-ab5d-05aae3736dc9' :: uuid ))
"qualify"
WHERE ( "qual0" >= 0
AND "qual0" <= 100 )) "qualify_mask"
After:
SELECT "col1",
"col2",
"_optimizer_count"
FROM (SELECT *
FROM (SELECT "segment"."id" AS
"col1",
"segment"."video_asset_id" AS
"col2",
(SELECT Count(*)
FROM (SELECT U0."id",
U0."asset"
FROM "segment" U0
WHERE U0."asset" = (
"segment"."asset" ))
_count) AS
"_optimizer_count",
Row_number()
over (
PARTITION BY "segment"."asset") AS
"qual0"
FROM "segment"
WHERE "segment"."asset" IN (
'194eec67-9d09-442e-892a-285f6a0a28ac' :: uuid,
'2d0718e5-3667-4e4d-9252-c15857c0c593' :: uuid,
...
'dd7787b2-7e6d-43c4-a729-56101c8d0fee' :: uuid,
'930dfa25-9be5-448d-ab5d-05aae3736dc9' :: uuid ))
"qualify"
WHERE ( "qual0" >= 0
AND "qual0" <= 100 )) "qualify_mask"
5x slowdown on all queries is definitely enough to warrant further optimization, as I already see this as an uncommon use case.
A small issue is that I would need to make a configuration for indicating which field on the connection is for totalCount
, and make the optimization based on that, and if the field is not selected, then connection.lenght
could be different. I think this should be an OK compromise.
Makes sense. I think that's a totally reasonable compromise!
Added a setting called TOTAL_COUNT_FIELD
, which is totalCount
by default, and can be used to point the field on the connection field that needs information from connection.length
. If this field is selected in the query, the optimizer will add the subquery annotation for fetching correct total count, otherwise it will skip this for better performance. Released on 0.2.7.
I have read the docs thoroughly before making this bug report.
I have read through other open issues, and my issue is not a duplicate.
What version of the library you are using?
0.2.3
Which python version are you using?
3.12
What operating system are you on?
Mac
Description
Inspired by this comment, I was leveraging a custom
Connection
class to get the total count and edge count of any relay node being queried in the following manner:Then, given the node inherits
CustomDjangoObjectType
, I was able to do:This used to work with version
0.1.11
, but doesn't since0.2.0
. I believe this is because of how_get_arguments
works where there's a fair assumption that we have to go deeper to get the actual fields of a relay node.When I run the above query, I get the following error:
which steps from this line: https://github.com/MrThearMan/graphene-django-query-optimizer/blob/017fd8add2a7710bf7347ee4df7b858110ee36c5/query_optimizer/utils.py#L205
Interestingly, if I write the query this way:
I do not have an issue.
However, if I try this, I still run into the same bug:
I'm not tied to using the
CustomConnection
mechanism, but it's very helpful to have atotalCount
andedgeCount
as top level fields on the nodes. Would it be possible to add back support for this, or some other mechanism to achieve the same?