getkirby / ideas

This is the backlog of ideas and feature requests from the last two years. Use our new feedback platform to post your new ideas or vote on existing ideas.
https://feedback.getkirby.com
20 stars 0 forks source link

Page class that emphasizes performance #590

Open hdodov opened 4 years ago

hdodov commented 4 years ago

I'm testing the performance of Kirby with large volumes of content. I have a page with 13,000 children and running:

$entries = site()->page('notes')->children();

...takes 1.14289 seconds on average. However, running:

$path = site()->root() . '/notes';
$entries = Dir::read($path);

...takes just 0.01498 seconds. That's a lot quicker and it got me thinking... if the file system is so fast, perhaps there's a way to cut some corners and preserve that speed?

I managed to get a huge performance boost by simplifying the internal inventory() method via a custom page model:

<?php

use Kirby\Cms\Dir;
use Kirby\Cms\Page;

class NotesPage extends Page
{
    public function inventory(): array
    {
        if ($this->inventory !== null) {
            return $this->inventory;
        }

        $children = [];
        $root = $this->root();
        $listing = Dir::read($root);

        foreach ($listing as $entry) {
            // Ignore files and hidden folders.
            if (strpos($entry, '.')) {
                continue;
            }

            if (preg_match('/^([0-9]+)' . Dir::$numSeparator . '(.*)$/', $entry, $match)) {
                $num  = $match[1];
                $slug = $match[2];
            } else {
                $num  = null;
                $slug = $entry;
            }

            $children[] = [
                'dirname' => $entry,
                'model'   => null,
                'num'     => $num,
                'root'    => "$root/$entry",
                'slug'    => $slug,
            ];
        }

        return $this->inventory = [
            'children' => $children,
            'files'    => [],
            'template' => 'notes'
        ];
    }
}

Now, running:

$entries = site()->page('notes')->children();

...takes just 0.17732 seconds. That's already 644% faster, while you don't lose much functionality. A simple is_dir() call increases that time to 0.44538...

To save even more time, I can change 'model' => null to 'model' => 'note' and assume that all child pages have the note template. Then, in that model, I greatly simplify the constructor:

<?php

use Kirby\Cms\Page;

class NotePage extends Page
{
    public function __construct(array $props)
    {
        $this->dirname = $props['dirname'] ?? null;
        $this->setNum($props['num'] ?? null);
        $this->root = $props['root'] ?? null;
        $this->slug = $props['slug'] ?? null;
        $this->parent = $props['parent'] ?? null;
        $this->site = $props['site'] ?? null;
        $this->isDraft = $props['isDraft'] ?? false;
    }
}

...and this brings the time down to 0.03485 seconds!


I'm sure we could save a lot of time by skipping or hardcoding parts of the logic. Kirby could provide a QuickPage class that functions the same way as the Page class, but takes a lot more shortcuts to improve performance. Then, you can extend that class in your custom page models to get the benefit. Static properties could be used for configuration. For example:

<?php

use Kirby\Cms\QuickPage;

class NotesPage extends QuickPage
{
    static $assumedTemplate = 'notes';
    static $assumedChildTemplate = 'note';
    // and so on...
}

Then, since child pages are assumed to be of template note, you can use QuickPage for them as well:

<?php

use Kirby\Cms\QuickPage;

class NotePage extends QuickPage
{
    // Uses the simplified constructor.
}
bastianallgeier commented 4 years ago

This is super impressive! The inventory part has always been tricky in order to get all the functionality built in without messing up the performance too much. Your performance optimizations clearly show how many trade-offs are still made in order to get the correct template or keep the constructor easy to extend.

I would love to have a way to improve this. Maybe even on config settings.

bastianallgeier commented 4 years ago

I wonder if we could do something like this:

/notes/index.php

return [
  'model' => 'NotesPage',
  'template' => 'notes',
  'children'  => [
    'template' => 'note',
    'model' => 'NotePage',
  ]
];

I'm not entirely sure about the naming of that file yet, but if Kirby finds such a file it would automatically load that and optimize the Page class according to those settings.

hdodov commented 4 years ago

@bastianallgeier where would that index.php file be? In the content folder? I think that using page models would be more straightforward and feel more natural. I mean, that's kind of their point, isn't it? Besides, users could extend it further and squeeze even more performance out of it. Otherwise, you would need to use both this index.php and a model.

Also, using a different Page class means Kirby could even provide different "strategies" for optimizations. For example, you could have QuickPage that works one way and TurboPage that works another way (obviously, not using these names). Then, in your model, you use whichever one fits best. Additionally, plugins could offer their own optimized Page classes for even greater flexibility.

hdodov commented 4 years ago

In the panel, even with the optimizations from my first comment, the pages section for the notes page is still sluggish. This request:

http://localhost/api/pages/notes/sections/pages?page=1

...takes 3.35 seconds. The cause appears to be, quite ironically, the "loop for the best performance". 😄 If I comment it out, the response time drops to 0.01434 seconds and the whole experience becomes very snappy.

nilshoerrmann commented 4 years ago

My guess is that effect is only noteworthy for large collections? So usually the developers know when they handle that kind of large content. So why not have an extra param for the page method? E. g. $page('notes', ['some' => 'assumptions']).

bastianallgeier commented 4 years ago

You are probably right that models are a better place for that.

I know why that loop is slow. It's all about the $page->isReadable() filter. That one is quite a beast.

hdodov commented 4 years ago

@nilshoerrmann the developers know it, yes, but Kirby doesn't. Therefore, the panel will still be slower. By using a page model, everything is faster. I actually prefer it because all logic is self-contained. If I decide that I don't need this performance boost at some point or it causes a huge issue, I can simply delete the page model. No refactors needed.

hdodov commented 4 years ago

@bastianallgeier continuing with the panel, the other slow request I found out was with the single-page view, i.e. the note template. This request:

http://localhost/api/pages/notes+cp1846baxpte5192?view=panel

...takes 3.37 seconds to finish. It appears to be due to the intendedTemplate filter when getting the prev and next pages for the navigation. If I turn that filter to a no-op, the response time drops to 0.10311 seconds. If Kirby can somehow use the parent model notes to see that the assumed template for all children is note, it wouldn't have to make this check for real.

I'm very excited because with this, the huge amount of data makes no difference to the user experience. It feels as if I'm navigating a site with 10 pages, not 10000. It's unbelievable...

hdodov commented 4 years ago

@bastianallgeier further digging in the isReadable() method - it appears that the call to intendedTemplate() is the cause for this issue as well. If I hardcode $template = 'note' - it's blazing fast.

It seems that allowing Kirby to assume the template of a page can bring tremendous performance boosts.

Edit: Come to think of it... you can extend the intendedTemplate() in a custom model:

class NotePage extends Page
{
    public function intendedTemplate()
    {
        if ($this->intendedTemplate !== null) {
            return $this->intendedTemplate;
        }

        return $this->setTemplate('note')->intendedTemplate();
    }
}

And it's also kind of a no-brainer. If a page uses the NotePage model, isn't it obvious that the intended template is the note template? Perhaps Kirby could safely assume that always. I guess I should open that in another issue, though. It fixes both problems I mentioned above regarding the requests.

hdodov commented 4 years ago

I added some more pages and I'm testing with exactly 100,000 now.

bastianallgeier commented 4 years ago

Holy shit! Your research is pure gold! I wonder if it makes sense to start thinking about cashing those bottlenecks for pages where models don't work/exist.

hdodov commented 4 years ago

There might be a clever way to cache, but I'm pretty sure that models will be needed to provide the best possible result. For example, indexing 10,000 pages without any optimizations takes 1.21909 seconds. If I remove this natsort() call, the time drops to 1.12818 seconds. That's a 7.45% improvement, which is pretty significant.

However, sorting the files is a feature that users will want in 90% of the cases. But for the other 10% where performance is more important - that'll be a problem. I believe the solution isn't to pick one or the other, but to give developers the option pick the one they need. Maybe we can "toggle" features like that with static properties. For example:

class NotesPage extends Page
{
    public static $natsort = false;
}
nilshoerrmann commented 4 years ago

Couldn't these toggles be set inside the blueprints?

bastianallgeier commented 4 years ago

@nilshoerrmann it would erase all performance benefits if we need to parse the blueprint first.

nilshoerrmann commented 4 years ago

Right. I was just asking because it's getting hard to keep track of all settings if they spread across so many places in a project (blueprint, core, model). This is especially true when you're getting back to a project after a few months. And blueprints are usually the place where sorts, templates and such a set up. So this needs to be documented carefully.

lukasbestle commented 4 years ago

I agree with @nilshoerrmann and @hdodov that introducing more and more concepts (like a special performance optimization config file) would make it all a lot more confusing. From all the variants discussed in this issue, my favorite is putting the options into the model. Advantages: Kirby wouldn't need to load extra files and as this is an advanced feature, it fits quite well into the model.

What about an array for all such options:

<?php

class NotesPage extends Page
{
    protected static $performanceOptions = [
        ...
    ];
}

Our core Page class could define defaults and the models could set just the options the user wants changed. In the Page class, these could then be merged dynamically. Advantage of a single array property: We could extend this later without using up dozens of property names.

nilshoerrmann commented 4 years ago

@lukasbestle: In order to make it easier to understand what's happening in an install, it would be great if such performance options were printed (if set) when using the dump() helper on a page object. This would help those of us who don't consider themselves programmers to debug a site ;) Thanks!

lukasbestle commented 4 years ago

That's doable! 👍