getkirby / kirby

Kirby's core application folder
https://getkirby.com
Other
1.27k stars 167 forks source link

Extremely slow site due to the use of models #5126

Open hdodov opened 1 year ago

hdodov commented 1 year ago

Description

This is how Kirby determines the model for a given page:

https://github.com/getkirby/kirby/blob/9ecd11cd5b7d4875d4212f767a2b3ad3e52b3528/src/Filesystem/Dir.php#L346-L353

It means that for each child page, it checks every possible model for its existence. If I have 10 models, that means 10 file_exists() checks. Of course, if your model starts with "a", it will alphabetically be first and you might get away with just one iteration, then the loop breaks. But that's not guaranteed.

In our site, we have about 900 pages and 7 models. That's 6300 filesystem calls to index all pages. Adding just one more model and we're up to over 7000…

Our panel currently takes ~2.5 seconds to load. If I delete our models, that time drops down to ~0.4 seconds. In other words, the use of models makes our panel 6 times slower. Outside of the panel, our home page is also about 3 times slower.

Possible Solution

You wouldn't have to make all those filesystem calls if the filename is predetermined. All TXT files of pages could just be page.txt (the same way site.txt works) and you could store the model type inside the page, like so:

Model: article

----

Title: Hello World
distantnative commented 1 year ago

As your proposed solution would fundamentally break 100% of all existing websites, it's not really an option. In many cases, reading a file will also be much slower than checking the filename. So this would slow down other use cases tremendously.

But I understand your described problem case and we will look into it if there is a solution to it.

hdodov commented 1 year ago

Reading files would be slower, yes, but if they're PHP files, as I've previously suggested - it might not be.

You're going to be focusing on Kirby 4, aren't you? Seems like a good opportunity to introduce some breaking changes in the name of making things better.

distantnative commented 1 year ago

@hdodov We have to agree to disagree here - breaking sites that drastically for 99% of the users also going to v4 is not something that we know our customers would value. Getting the performance problems you mention resolved in another way, certainly would be great.

hdodov commented 1 year ago

A migration script could make things a lot more bearable. I had no problems making a script that turns all TXT files to PHP ones. It'd be as easy to also make it rename files and put their names in the resulting contents of the file.

Personally, as a user, I don't think it's so dramatic. If Kirby 4 introduces a lot of breaking changes and there's a migration script:

As you've said in the other issue:

we need to first sort out things, refactoring big parts of the system etc.

I'm not sure it's reasonable to think you can refactor big parts of the system and keep 100% of the users happy and free of breaking changes… unless you introduce a ton of additional complexity that makes things even worse.

I don't think your users are as fragile as you think. I'm making those performance changes in a site that's over 3 years old and, believe me, we don't have time for huge refactors. I'm focusing solely on changes that don't need such.

Here's my two cents: breaking changes, combined with a well-written guide and migration script would be more than enough for a major version of the project, where such things are expected anyway.

It's your call, though. 🙌

distantnative commented 1 year ago

@hdodov could you please test if https://github.com/getkirby/kirby/commit/3f30e98ea85fdc6720d9aff7f9ea14f29abfce30 does improve performance for you at all? (or if you could share your setup/site for me to test, but I understand if that's not possible)

distantnative commented 1 year ago

Ok, might be even slower 😞

Before:

Screenshot 2023-03-28 at 15 20 39

After:

Screenshot 2023-03-28 at 15 19 08

Test:


$models = ['model-a', 'model-b', 'model-c', 'model-d', 'model-e', 'model-f'];

Dir::make($this->tmp);

foreach (range(1, 5000) as $i) {
    Dir::make($this->tmp . '/sub' . $i);
    F::write($this->tmp . '/sub' . $i . '/image.jpg', '');
    F::write($this->tmp . '/sub' . $i . '/image.txt', '');
    F::write($this->tmp . '/sub' . $i . '/document.pdf', '');
    F::write($this->tmp . '/sub' . $i . '/document.txt', '');
    F::write($this->tmp . '/sub' . $i . '/' . $models[array_rand($models)] . '.txt', '');
}

$totals = [
    'with'    => [],
    'without' => []
];

$runs = 50;

foreach (range(1, $runs) as $i) {
    $start = microtime(true);
    Dir::inventory($this->tmp);
    $totals['without'][] = microtime(true) - $start;
}

Page::$models = $models;

foreach (range(1, $runs) as $i) {
    $start = microtime(true);
    Dir::inventory($this->tmp);
    $totals['with'][] = microtime(true) - $start;
}

var_dump('
    Without models: ' . A::average($totals['without'], 7) . '
    With models:' . A::average($totals['with'], 7)
);
hdodov commented 1 year ago

I can't test it right away because we use an older version of Kirby (3.5.7.1), which also has some modifications, in order to handle issues we've had in the past.

I raised the issue here, because the code in the latest version is identical and therefore has the same bottleneck.

rasteiner commented 1 year ago

@distantnative about the benchmark, maybe it's not important and changes nothing, but try clearing the stat cache between runs. file_exists is cached, while glob is not (afaik). This might give file_exists an "unfair" advantage in the benchmark.


$models = ['model-a', 'model-b', 'model-c', 'model-d', 'model-e', 'model-f'];

Dir::make($this->tmp);

foreach (range(1, 5000) as $i) {
    Dir::make($this->tmp . '/sub' . $i);
    F::write($this->tmp . '/sub' . $i . '/image.jpg', '');
    F::write($this->tmp . '/sub' . $i . '/image.jpg.txt', '');
    F::write($this->tmp . '/sub' . $i . '/document.pdf', '');
    F::write($this->tmp . '/sub' . $i . '/document.pdf.txt', '');
    F::write($this->tmp . '/sub' . $i . '/' . $models[array_rand($models)] . '.txt', '');
}

$totals = [
    'with'    => [],
    'without' => []
];

$runs = 50;

foreach (range(1, $runs) as $i) {
    clearstatcache();
    $start = microtime(true);
    Dir::inventory($this->tmp);
    $totals['without'][] = microtime(true) - $start;
}

Page::$models = $models;

foreach (range(1, $runs) as $i) {
    clearstatcache();
    $start = microtime(true);
    Dir::inventory($this->tmp);
    $totals['with'][] = microtime(true) - $start;
}

var_dump('
    Without models: ' . A::average($totals['without'], 7) . '
    With models: ' . A::average($totals['with'], 7)
);
lukasbestle commented 1 year ago

Another possible solution that came to my mind:

What if we use Dir::read() on the folder of the child instead of looping through all models? This would convert potentially dozens of file_exists() calls to one scandir() call.

distantnative commented 1 year ago

I tried this (directly with scandir() and then some array intersect logic. As well as the glob attempt. But all didn't seem to be reliably better. But as @rasteiner pointed out, maybe my testing is flawed.

distantnative commented 1 year ago

Testing with clearstatcache() between runs as suggested by @rasteiner:

main branch:

Screenshot 2023-04-02 at 15 26 09

Alternative 1

$models = array_keys(Page::$models);
if (count($models) > 0) {
    $model = array_intersect($models, scandir($root))[0] ?? null;
}
Screenshot 2023-04-02 at 15 24 10

Alternative 2

$models  = implode(',', array_keys(Page::$models));
$before  = $root . '/';
$after   = '.' . $contentExtension;
$pattern = $before . '{' . $models . '}' . $after;
if ($models = glob($pattern, GLOB_NOSORT|GLOB_BRACE)) {
    $model = Str::between($models[0], $before, $after);
}
Screenshot 2023-04-02 at 15 28 17

So I think glob is really out of the question but scandir() could bring some improvements. Although it has a much steeper performance drop from 0 to 1 model. While the current implementation rather worsen with the increasing number of models.

distantnative commented 1 year ago

Maybe more apparent with 20 models

main branch

Screenshot 2023-04-02 at 15 37 15

scandir intersect

Screenshot 2023-04-02 at 15 38 09
lukasbestle commented 1 year ago

Since we know the number of models, we could use a hybrid approach: Use file_exists() for maybe up to five models and scandir() for more than five.

distantnative commented 1 year ago

https://github.com/getkirby/kirby/commit/50addd33a781af762953d02b27823ee59dde6de8

lukasbestle commented 1 year ago

I think that looks promising. It's already an improvement for the current setup, no matter if we ever change the content file naming structure.

Could you add a comment above the if like "For many models, one call to scandir() is faster than many calls to file_exists()" so we will be able to remember why we did that?

distantnative commented 1 year ago

https://github.com/getkirby/kirby/pull/5136

distantnative commented 8 months ago

Back at it again, this time with proper performance testing.

Test setup has 26 child pages with each a normal content file and two images + meta files.

Current implementation:

Screenshot 2024-01-21 at 13 58 12

scandir alternative:

Screenshot 2024-01-21 at 13 59 14

So already here the alternative looks like it would only improve the situation with 20+ models. Even less if we consider pages with more than 2 images and meta files but quite a lot of files that scandir would have to deal with.

Not that convinced by that solution in consequence.

If anyone has new ideas, please share them - otherwise, I'd think that part of code is as good as we have solutions for for now.