Open stof opened 4 years ago
Apparently, adding a detection for smarter defaults would be very welcome:
What makes things worse is that only part of these settings are documented, making things even harder to configure properly.
Would it be possible to figure out what settings to use when we'd have the AST from the query parser at hand?
As described in the issue, most of those things can be figured out as the current code validates those. But nobody has taken the time to implement them until now.
I was about to create a similar RFC github issue about the default performance of the Paginator, and I'm glad I spent 10 minutes to use the "search" function to find this ticket. I reached the exact same conclusions as stof after debugging why simple cruds in my webapp take seconds to load. What's not ideal, however, is seeing that this issue remains opened with limited response from the project managers since 2020.
The following part especially is such a cheap change that it hurts seeing it not being in place already:
if the query does not make joins, we are 100% sure we don't fetch join a collection so we can set it [i.e. the fetchJoinCollections] to false
This alone improved some of my queries from 3.5s to 0.1s.
It took me a while to trace how the Paginator builds its queries, and to understand my options, so in case someone reads this comment and wishes to "just get stuff done" (or wishes to compare approaches on how to improve the situation while this ticket remains open), here's what I did (in my case: with Pagerfanta):
use Pagerfanta\Doctrine\ORM\QueryAdapter;
use Pagerfanta\Pagerfanta;
public function createPagerfantaPaginator(
QueryBuilder $queryBuilder,
bool $hasXToManyJoins = null,
// ... other args
): Pagerfanta {
$query = $queryOrQueryBuilder->getQuery();
/*
* Case when the query has any joins, any of which potentially being [X]ToMany: run the more expensive
* paginator queries.
*/
$hasXToManyJoins ??= count($queryOrQueryBuilder->getDQLParts()['join']) > 0;
if (!$hasXToManyJoins) {
$query->setHint(CountWalker::HINT_DISTINCT, false);
}
$adapter = new QueryAdapter(
$query,
fetchJoinCollection: $hasXToManyJoins,
useOutputWalkers: $hasXToManyJoins,
);
// ... continue creating the Pagerfanta paginator using the $adapter created above
}
I'm looking forward to something along the above lines being implemented in the Doctrine Paginator itself, especially (and as stof already mentioned) because:
Thank you.
What's not ideal, however, is seeing that this issue remains opened with limited response from the project managers since 2020.
Ok, I'll be more precise then: this seems like a good idea, and I would be open to reviewing PRs implementing the changes described above, or improving the documentation to include a description of the current settings, as well as merging them of course. Just make sure to send reasonably-sized PRs, with the lowest hanging fruits / less risky stuff dealt with first please.
Thanks for clarifying, and apologies if I sounded too belligerent.
I could spend some time to write up a PR for further discussion. I would be grateful however if you could look at my proposition below and give an "initial ok" for it, so that a rough direction is initially deemed acceptable.
This entire ticket is essentially (and I suspect stof would agree) about doing the following:
If a query doesn't have any joins, or all the joins are *ToOne, then the following defaults should be used:
- "fetchJoinCollections" should be set to false
- "useOutputWalker" should be set to false (unless the query is also subject to the caveats mentioned by stof, like using a HAVING clause)
- "CountWalker::HINT_DISTINCT" hint should be set to false (which would be effective only if "useOutputWalker" is also false)
Proposition:
The $fetchJoinCollections
argument in Paginator::__construct()
becomes nullable and is defaulted to null. When the Paginator is constructed with that argument set to null, an "auto-detection" is going to be carried out by the Paginator on whether the argument should be true or false. Likewise, when $useOutputWalker
is null (and the query does not use a "custom output walker"), then an auto-detection of that argument is also going to be done. In the same vain, when the query does not have a CountWalker::HINT_DISTINCT
set on it, and the $useOutputWalker
is concluded to be false
, then an auto-detection of whether the hint should be set to true or false is also going to be carried out.
In the first PR, the auto-detection is going to be attempted only when a QueryBuilder (instead of a Query) is supplied to the Paginator.
2.1. When an instance of a Query is supplied, and the $fetchJoinCollections
is set to null, then $fetchJoinCollections
is going to be set back to true
(which is the current behaviour).
If the QueryBuilder has no joins, then all the defaults mentioned at the beginning of this comment are going to be set.
I will spend 30 minutes to see whether it's JustTM possible to get the information from the QueryBuilder whether the joins are *ToMany or not. If that's easy to do, this would also be done in the first PR.
Bottom line: very simple code change. No AST interpretation and modification. Pretty much what I did in the code snippet mentioned earlier.
Thoughts?
On paper it sounds OK, I think we might want to make this opt-in at least at first, so that people can try it and report back, then default to null
in a next minor or even major
if we want to be careful.
The Paginator class exposes settings to configure how queries are paginated:
fetchJoinCollections
, which defaults totrue
useOutputWalker
, which defaults to using it if the query does not already have a custom output walker (we can only have one output walker. Not sure things actually work at all if a custom output walker is applied as it would apply on the modified query)CountWalker
(using when not using the output walker) allows to control itsDISTINCT
flag through a query hint storing a boolean, defaulting totrue
The existing defaults are set to maximize compatibility with queries, as they are the one producing the correct result (and supporting the query at all) for most queries. But they are also the one producing the worst queries from a performance point of view. Note that
useOutputWalker
controls the implementation used for getting the results of the slice whenfetchJoinCollections
istrue
and also the implementation used for getting the count.Limitations of various configs:
fetchJoinCollections = false
will produce the wrong results (potentially very badly broken) if the query is actually fetch-joining a collection, due to the fact that an hydrated result spans multiple SQL result in such case.useOutputWalker = false
does not support queries usingHAVING
useOutputWalker = false
does not support entities using a composite identifieruseOutputWalker = false
does not support entities using a foreign key as identifieruseOutputWalker = false
does not support queries using a field from atoMany
relation in theORDER BY
clauseuseOutputWalker = false
does not support queries using multiple entities inFROM
We could have smarter defaults, which would try automatically choose the best behavior (keeping the ability to configure them explicitly for cases where the default cannot be smart enough, as we must still choose correctness and compatibility by default).
For
fetchJoinCollections
:false
toOne
relations, it is also safe to set this flag tofalse
. But this case might be harder to identify based on the query AST.For
useOutputWalker
, the cases of composite identifiers or association identifiers in entities can be detected (the walkers already validate those) and checking forHAVING
and multipleFROM
is easy. Regarding the last case, it is also possible to validate it (the code does it in the tree walker currently). So we could be smarter about using a tree walker. So it would be better to try to use the tree walker as much as possible.For the
CountWalker
, settingDISTINCT
tofalse
might produce a broken count if there are multiple SQL results for the same hydrated result (which might happen in case of joins)In practice, the Paginator class is often used wrapped inside a higher-level API (as it is low-level on purpose) like Pagerfanta, Zend Paginator or KnpPaginator. But it can also be used directly (in a custom higher-level API) as done in https://github.com/symfony/demo. Such higher-level APIs would have 3 choices:
Applying smarter defaults in these higher-level libraries has several drawbacks:
In practice, libraries are either exposing the same defaults or changing them to a way which does not support all queries. And in most cases, this pushes the burden to projects to discover what they should do to get a good performance.