doctrine / orm

Doctrine Object Relational Mapper (ORM)
https://www.doctrine-project.org/projects/orm.html
MIT License
9.92k stars 2.51k forks source link

[RFC] Smarter paginator default #8278

Open stof opened 4 years ago

stof commented 4 years ago

The Paginator class exposes settings to configure how queries are paginated:

The existing defaults are set to maximize compatibility with queries, as they are the one producing the correct result (and supporting the query at all) for most queries. But they are also the one producing the worst queries from a performance point of view. Note that useOutputWalker controls the implementation used for getting the results of the slice when fetchJoinCollections is true and also the implementation used for getting the count.

Limitations of various configs:

We could have smarter defaults, which would try automatically choose the best behavior (keeping the ability to configure them explicitly for cases where the default cannot be smart enough, as we must still choose correctness and compatibility by default).

For fetchJoinCollections:

For useOutputWalker, the cases of composite identifiers or association identifiers in entities can be detected (the walkers already validate those) and checking for HAVING and multiple FROM is easy. Regarding the last case, it is also possible to validate it (the code does it in the tree walker currently). So we could be smarter about using a tree walker. So it would be better to try to use the tree walker as much as possible.

For the CountWalker, setting DISTINCT to false might produce a broken count if there are multiple SQL results for the same hydrated result (which might happen in case of joins)

In practice, the Paginator class is often used wrapped inside a higher-level API (as it is low-level on purpose) like Pagerfanta, Zend Paginator or KnpPaginator. But it can also be used directly (in a custom higher-level API) as done in https://github.com/symfony/demo. Such higher-level APIs would have 3 choices:

Applying smarter defaults in these higher-level libraries has several drawbacks:

In practice, libraries are either exposing the same defaults or changing them to a way which does not support all queries. And in most cases, this pushes the burden to projects to discover what they should do to get a good performance.

greg0ire commented 4 years ago

Apparently, adding a detection for smarter defaults would be very welcome:

https://github.com/doctrine/orm/blob/107ba93d79b3e4dc9a8b0b7903245ed738a811a2/docs/en/tutorials/pagination.rst#L42-L43

stof commented 4 years ago

What makes things worse is that only part of these settings are documented, making things even harder to configure properly.

mpdude commented 1 year ago

Would it be possible to figure out what settings to use when we'd have the AST from the query parser at hand?

stof commented 1 year ago

As described in the issue, most of those things can be figured out as the current code validates those. But nobody has taken the time to implement them until now.

d-ph commented 2 months ago

I was about to create a similar RFC github issue about the default performance of the Paginator, and I'm glad I spent 10 minutes to use the "search" function to find this ticket. I reached the exact same conclusions as stof after debugging why simple cruds in my webapp take seconds to load. What's not ideal, however, is seeing that this issue remains opened with limited response from the project managers since 2020.

The following part especially is such a cheap change that it hurts seeing it not being in place already:

if the query does not make joins, we are 100% sure we don't fetch join a collection so we can set it [i.e. the fetchJoinCollections] to false

This alone improved some of my queries from 3.5s to 0.1s.


It took me a while to trace how the Paginator builds its queries, and to understand my options, so in case someone reads this comment and wishes to "just get stuff done" (or wishes to compare approaches on how to improve the situation while this ticket remains open), here's what I did (in my case: with Pagerfanta):

use Pagerfanta\Doctrine\ORM\QueryAdapter;
use Pagerfanta\Pagerfanta;

public function createPagerfantaPaginator(
    QueryBuilder $queryBuilder,
    bool $hasXToManyJoins = null,
    // ... other args
): Pagerfanta {
    $query = $queryOrQueryBuilder->getQuery();

    /*
     * Case when the query has any joins, any of which potentially being [X]ToMany: run the more expensive
     * paginator queries.
     */
    $hasXToManyJoins ??= count($queryOrQueryBuilder->getDQLParts()['join']) > 0;

    if (!$hasXToManyJoins) {
        $query->setHint(CountWalker::HINT_DISTINCT, false);
    }

    $adapter = new QueryAdapter(
        $query,
        fetchJoinCollection: $hasXToManyJoins,
        useOutputWalkers: $hasXToManyJoins,
    );

    // ... continue creating the Pagerfanta paginator using the $adapter created above
}

I'm looking forward to something along the above lines being implemented in the Doctrine Paginator itself, especially (and as stof already mentioned) because:

  1. This affect everyone, including the higher-level Paginator tools. It takes everyone's time to discover this problem and to action it.
  2. The Paginator already accepts an instance of QueryBuilder in its constructor, so it already is in a position to choose better defaults based on the wealth of information that the QueryBuilder conveys about the DQL query being built.

Thank you.

greg0ire commented 2 months ago

What's not ideal, however, is seeing that this issue remains opened with limited response from the project managers since 2020.

Ok, I'll be more precise then: this seems like a good idea, and I would be open to reviewing PRs implementing the changes described above, or improving the documentation to include a description of the current settings, as well as merging them of course. Just make sure to send reasonably-sized PRs, with the lowest hanging fruits / less risky stuff dealt with first please.

d-ph commented 2 months ago

Thanks for clarifying, and apologies if I sounded too belligerent.

I could spend some time to write up a PR for further discussion. I would be grateful however if you could look at my proposition below and give an "initial ok" for it, so that a rough direction is initially deemed acceptable.

This entire ticket is essentially (and I suspect stof would agree) about doing the following:

If a query doesn't have any joins, or all the joins are *ToOne, then the following defaults should be used:

  • "fetchJoinCollections" should be set to false
  • "useOutputWalker" should be set to false (unless the query is also subject to the caveats mentioned by stof, like using a HAVING clause)
  • "CountWalker::HINT_DISTINCT" hint should be set to false (which would be effective only if "useOutputWalker" is also false)

Proposition:

  1. The $fetchJoinCollections argument in Paginator::__construct() becomes nullable and is defaulted to null. When the Paginator is constructed with that argument set to null, an "auto-detection" is going to be carried out by the Paginator on whether the argument should be true or false. Likewise, when $useOutputWalker is null (and the query does not use a "custom output walker"), then an auto-detection of that argument is also going to be done. In the same vain, when the query does not have a CountWalker::HINT_DISTINCT set on it, and the $useOutputWalker is concluded to be false, then an auto-detection of whether the hint should be set to true or false is also going to be carried out.

  2. In the first PR, the auto-detection is going to be attempted only when a QueryBuilder (instead of a Query) is supplied to the Paginator.

2.1. When an instance of a Query is supplied, and the $fetchJoinCollections is set to null, then $fetchJoinCollections is going to be set back to true (which is the current behaviour).

  1. If the QueryBuilder has no joins, then all the defaults mentioned at the beginning of this comment are going to be set.

  2. I will spend 30 minutes to see whether it's JustTM possible to get the information from the QueryBuilder whether the joins are *ToMany or not. If that's easy to do, this would also be done in the first PR.


Bottom line: very simple code change. No AST interpretation and modification. Pretty much what I did in the code snippet mentioned earlier.

Thoughts?

greg0ire commented 2 months ago

On paper it sounds OK, I think we might want to make this opt-in at least at first, so that people can try it and report back, then default to null in a next minor or even major if we want to be careful.