Open hdodov opened 1 year ago
As your proposed solution would fundamentally break 100% of all existing websites, it's not really an option. In many cases, reading a file will also be much slower than checking the filename. So this would slow down other use cases tremendously.
But I understand your described problem case and we will look into it if there is a solution to it.
Reading files would be slower, yes, but if they're PHP files, as I've previously suggested - it might not be.
You're going to be focusing on Kirby 4, aren't you? Seems like a good opportunity to introduce some breaking changes in the name of making things better.
@hdodov We have to agree to disagree here - breaking sites that drastically for 99% of the users also going to v4 is not something that we know our customers would value. Getting the performance problems you mention resolved in another way, certainly would be great.
A migration script could make things a lot more bearable. I had no problems making a script that turns all TXT files to PHP ones. It'd be as easy to also make it rename files and put their names in the resulting contents of the file.
Personally, as a user, I don't think it's so dramatic. If Kirby 4 introduces a lot of breaking changes and there's a migration script:
As you've said in the other issue:
we need to first sort out things, refactoring big parts of the system etc.
I'm not sure it's reasonable to think you can refactor big parts of the system and keep 100% of the users happy and free of breaking changes… unless you introduce a ton of additional complexity that makes things even worse.
I don't think your users are as fragile as you think. I'm making those performance changes in a site that's over 3 years old and, believe me, we don't have time for huge refactors. I'm focusing solely on changes that don't need such.
Here's my two cents: breaking changes, combined with a well-written guide and migration script would be more than enough for a major version of the project, where such things are expected anyway.
It's your call, though. 🙌
@hdodov could you please test if https://github.com/getkirby/kirby/commit/3f30e98ea85fdc6720d9aff7f9ea14f29abfce30 does improve performance for you at all? (or if you could share your setup/site for me to test, but I understand if that's not possible)
Ok, might be even slower 😞
Before:
After:
Test:
$models = ['model-a', 'model-b', 'model-c', 'model-d', 'model-e', 'model-f'];
Dir::make($this->tmp);
foreach (range(1, 5000) as $i) {
Dir::make($this->tmp . '/sub' . $i);
F::write($this->tmp . '/sub' . $i . '/image.jpg', '');
F::write($this->tmp . '/sub' . $i . '/image.txt', '');
F::write($this->tmp . '/sub' . $i . '/document.pdf', '');
F::write($this->tmp . '/sub' . $i . '/document.txt', '');
F::write($this->tmp . '/sub' . $i . '/' . $models[array_rand($models)] . '.txt', '');
}
$totals = [
'with' => [],
'without' => []
];
$runs = 50;
foreach (range(1, $runs) as $i) {
$start = microtime(true);
Dir::inventory($this->tmp);
$totals['without'][] = microtime(true) - $start;
}
Page::$models = $models;
foreach (range(1, $runs) as $i) {
$start = microtime(true);
Dir::inventory($this->tmp);
$totals['with'][] = microtime(true) - $start;
}
var_dump('
Without models: ' . A::average($totals['without'], 7) . '
With models:' . A::average($totals['with'], 7)
);
I can't test it right away because we use an older version of Kirby (3.5.7.1), which also has some modifications, in order to handle issues we've had in the past.
I raised the issue here, because the code in the latest version is identical and therefore has the same bottleneck.
@distantnative about the benchmark, maybe it's not important and changes nothing, but try clearing the stat cache between runs. file_exists
is cached, while glob
is not (afaik). This might give file_exists
an "unfair" advantage in the benchmark.
$models = ['model-a', 'model-b', 'model-c', 'model-d', 'model-e', 'model-f'];
Dir::make($this->tmp);
foreach (range(1, 5000) as $i) {
Dir::make($this->tmp . '/sub' . $i);
F::write($this->tmp . '/sub' . $i . '/image.jpg', '');
F::write($this->tmp . '/sub' . $i . '/image.jpg.txt', '');
F::write($this->tmp . '/sub' . $i . '/document.pdf', '');
F::write($this->tmp . '/sub' . $i . '/document.pdf.txt', '');
F::write($this->tmp . '/sub' . $i . '/' . $models[array_rand($models)] . '.txt', '');
}
$totals = [
'with' => [],
'without' => []
];
$runs = 50;
foreach (range(1, $runs) as $i) {
clearstatcache();
$start = microtime(true);
Dir::inventory($this->tmp);
$totals['without'][] = microtime(true) - $start;
}
Page::$models = $models;
foreach (range(1, $runs) as $i) {
clearstatcache();
$start = microtime(true);
Dir::inventory($this->tmp);
$totals['with'][] = microtime(true) - $start;
}
var_dump('
Without models: ' . A::average($totals['without'], 7) . '
With models: ' . A::average($totals['with'], 7)
);
Another possible solution that came to my mind:
What if we use Dir::read()
on the folder of the child instead of looping through all models? This would convert potentially dozens of file_exists()
calls to one scandir()
call.
I tried this (directly with scandir()
and then some array intersect logic. As well as the glob attempt. But all didn't seem to be reliably better. But as @rasteiner pointed out, maybe my testing is flawed.
Testing with clearstatcache()
between runs as suggested by @rasteiner:
main
branch:$models = array_keys(Page::$models);
if (count($models) > 0) {
$model = array_intersect($models, scandir($root))[0] ?? null;
}
$models = implode(',', array_keys(Page::$models));
$before = $root . '/';
$after = '.' . $contentExtension;
$pattern = $before . '{' . $models . '}' . $after;
if ($models = glob($pattern, GLOB_NOSORT|GLOB_BRACE)) {
$model = Str::between($models[0], $before, $after);
}
So I think glob
is really out of the question but scandir()
could bring some improvements. Although it has a much steeper performance drop from 0 to 1 model. While the current implementation rather worsen with the increasing number of models.
Maybe more apparent with 20 models
main
branchscandir
intersectSince we know the number of models, we could use a hybrid approach: Use file_exists()
for maybe up to five models and scandir()
for more than five.
I think that looks promising. It's already an improvement for the current setup, no matter if we ever change the content file naming structure.
Could you add a comment above the if
like "For many models, one call to scandir() is faster than many calls to file_exists()" so we will be able to remember why we did that?
Back at it again, this time with proper performance testing.
Test setup has 26 child pages with each a normal content file and two images + meta files.
Current implementation:
scandir
alternative:
So already here the alternative looks like it would only improve the situation with 20+ models.
Even less if we consider pages with more than 2 images and meta files but quite a lot of files that scandir
would have to deal with.
Not that convinced by that solution in consequence.
If anyone has new ideas, please share them - otherwise, I'd think that part of code is as good as we have solutions for for now.
Description
This is how Kirby determines the model for a given page:
https://github.com/getkirby/kirby/blob/9ecd11cd5b7d4875d4212f767a2b3ad3e52b3528/src/Filesystem/Dir.php#L346-L353
It means that for each child page, it checks every possible model for its existence. If I have 10 models, that means 10
file_exists()
checks. Of course, if your model starts with "a", it will alphabetically be first and you might get away with just one iteration, then the loop breaks. But that's not guaranteed.In our site, we have about 900 pages and 7 models. That's 6300 filesystem calls to index all pages. Adding just one more model and we're up to over 7000…
Our panel currently takes ~2.5 seconds to load. If I delete our models, that time drops down to ~0.4 seconds. In other words, the use of models makes our panel 6 times slower. Outside of the panel, our home page is also about 3 times slower.
Possible Solution
You wouldn't have to make all those filesystem calls if the filename is predetermined. All TXT files of pages could just be
page.txt
(the same waysite.txt
works) and you could store the model type inside the page, like so: