getgrav / grav

Modern, Crazy Fast, Ridiculously Easy and Amazingly Powerful Flat-File CMS powered by PHP, Markdown, Twig, and Symfony
https://getgrav.org
MIT License
14.51k stars 1.41k forks source link

performance scaling issues #931

Closed fabrizioT closed 3 years ago

fabrizioT commented 8 years ago

Developer since 90's here, I'm currently trying Grav and find it pretty interesting.

This said, i think performance is disappointing for sites with about 1.000 pages or more. I'm used to work with 10.000+ pages sites, so can't recommend it to my customers while i think Grav is clever.

I'm testing Grav on a site with about 2.000 pages and i get 3-8 secs. loading time on server (dedicated). I've already tweaked _realpath_cachesize and installed opCache + YAML/PECL lib as well. Sure it's faster on SSDs, but that's overkill.

I know it's flatfile based, but I think some more efficient native YAML caching is deserved. I guess that will fix the main bottleneck.

Keep up the good work.

rhukster commented 8 years ago

When asked, we have always said Grav is best suited to sites < 1000 pages. We have had people use it for larger sites, but it does require decent hardware (SSDs are important, fast memory, etc), and sensible caching.

https://learn.getgrav.org/advanced/performance-and-caching

The thing is that without a database Grav has to check if files have changed in order to know if the cache needs to be flushed. A fast filesystem becomes critical as the page count goes up. You can turn off this check, and of course, things will run faster, but that initial load to gather up all the 'header' information will still be a bit slow.

There are a variety of plugins to help, for example we have a page cache plugin that effectively caches the content of each page. Also there's a pre-cache plugin that kicks off and runs out-of-process and caches every page's content so it's already pre-cached and ready when a user requests it.

Installing the native PHP YAML parser extension helps quite a bit, as does PHP 7.0, and these combined with SSD drives makes Grav run even faster. We already aggressively cache things but there is room for improvement for sure.

We had YAML parsed caching implemented and made a huge difference, but due to nagging bugs, we removed it for the time being. Also we are looking at implementing some other performance features specifically for larger sites (mentioned here: https://getgrav.org/blog/plans-for-2016), that include logic to better handle how the cache is built and maintained so the whole cache is not discarded when a change occurs, only the page/s that changed should be recached. This would speed up the 'initial' caching process and also when a page changes, the speed impact would be minimal.

Also we are looking at an optional plugin or feature for large sites that uses an index of some sort to keep track of file change timestamps, perhaps with an offline update mechanism, so we don't have to check these timestamps on every single page load.

These performance enhancements are coming. In the meantime, for smaller sites < 1000 pages, Grav is an amazing fit. For larger sites, a database-based CMS is probably a better bet.

fabrizioT commented 8 years ago

I think partial cache refresh mechanics + YAML parsed cache will cut it.

Looking forward to these improvements as "bigger" sites play do a role in real world, otherwise i'd say Grav will stay a bit "niche".

Thank you.

rhukster commented 8 years ago

Well, a bit niche is OK with me :) The web is generally moving towards smaller sites, and Grav was built with the express purpose of handling smaller sites better than 'general' CMSes like Joomla, Drupal, WordPress.

It does this already, but we don't want to limit ourselves, so we'll continue to improve things and make it even faster for smaller sites, with the added benefit that it will let it scale more. However, being flat-file means that it just is not going to scale to the levels of a database solution when sites get very large. i think 10k pages is a worthy goal to aspire to though. That would cover something like 99% of sites out there :)

ghost commented 8 years ago

I wasn't aware of the fact that Grav cannot handle larger sites well. @rhukster will this issue will be addressed in coming versions? Right now it won't be my problem but within next 6 months or so yes.

rhukster commented 8 years ago

It's definitely something we plan on addressing in the coming releases.

mahagr commented 8 years ago

I have some other ideas on how to improve performance; one of the obvious one is to create a simple database (sqllite or mysql) to contain the index data. Another improvement that I have in my mind is to have more selective passes on file timestamp checks. There's really no need to go through or update all the pages on every single page load. There are at least two ways to do that: one is to do the checks/updates in smaller batches, another is to check the updates only if the page is requested.

Isn't the yaml file caching on by default? Been a few months since I really looked into the code.

Rarst commented 7 years ago

Can I inquire which performance margin are you supposed to expect out of Grav over "general" CMS?

Empirically on my local machine the core+admin install of Grav seems to have ~60ms speeds. Which is on par with my WordPress install on same machine.

rhukster commented 7 years ago

Well it really depends on your hardware and the site itself. The admin is not going to be as fast as the frontend as it's a plugin and it's doing more work than the frontend does.

With PHP7 and apache on my mac, I expect to have < 50ms page rendering for most sites. For a comparable wordpress or joomla site, that would typically be in the 500ms range. So in my usages, Grav is usually around 5-10X faster. It really depends on the kind of sites you are dealing with though.

Rarst commented 7 years ago

I am talking ~60ms for front end. Same Apache, same PHP 7 w/ opcache, same APCu user cache (as Object Cache for WP), and decent hardware (Core i7, SSD).

I know that WP grinds to a halt if you pile up a lot of crap on it. My observation is that Grav’s core boot time is not significantly faster than WP’s, apples to apples.

rhukster commented 7 years ago

What are you running with what plugins? corei7 with SSD, php7 w/opache, etc, should be way faster than that. Even on my 2008 mac I get < 15ms processing times for Grav blog skeleton.

BTW, I'm talking about times as rendered by Grav itself (so turn on the Grav debugger to see). Not the times as seen by the browser, which includes the apache/browser communication stuff.

rhukster commented 7 years ago

2017-01-04 at 1 44 pm

Current site i'm developing for example.

Rarst commented 7 years ago

Here is what I am seeing with debugger enabled (as above — vanilla core+admin, two pages of content):

grav-debugger

rhukster commented 7 years ago

Hmm.. doesn't look like any one thing, just something overall is quite slow on your machine. This windows or a mac?

mahagr commented 7 years ago

I'm getting ~10ms (antimatter) and ~20ms (helium/gantry, though it has a lot more going on in the page) page load times on Helium demo. In latest WP I get ~50ms load times (WP performance has really been improved a lot since PHP7 and in their latest versions).

I have i5-6600K, fast SSD, Ubuntu, PHP7, Opcache, APCu, Apache... Just remember if you have XDebug turned on (I don't atm), it will affect more on Grav than it does on WP.

w00fz commented 7 years ago

Via https [h2 protocol] Grav v1.1.12 - Admin v1.2.7 Apache 2.4.23 PHP 7.0.14 OPcache 7.0.14 Xdebug off

2.8 GHz Intel Core i7 16 GB 1600 MHz DDR3 SSD drive

macOS 10.12.2

home grav 2017-01-04 22-46-54

Rarst commented 7 years ago

Hmm.. doesn't look like any one thing, just something overall is quite slow on your machine. This windows or a mac?

Windows, xdebug disabled. This is puzzling then because other things run fast (as I mentioned I get about 60–70ms out of WordPress and 10–20ms out of my stuff on Silex), so it doesn't seem like a general slow stack issue. Maybe my SSD is getting old, Blackfire profiles point to a lot of filesystem activity.

Will try on my notebook later today.

Rarst commented 7 years ago

Here is how Blackfire breaks down most slow operations in my case, file_exists() alone adds up to 20+ms:

balckfire-grav

Are these call counts normal for Grav?

Rarst commented 7 years ago

Getting same results on my notebook, which has newer and faster storage (twin SSD in RAID), also Windows.

So far it seems like either something in my stacks doesn't like that much filesystem access or for some reason Grav generates excessive filesystem access under circumstances. 300+ pings to filesystem is a lot more than I am used to seeing, I am not sure if that is just aspect of flat–file architecture in this case or something goes wrong with it [on Windows?].

mahagr commented 7 years ago

File operations that check for existing file, modification time etc should be cached by PHP and even by underlying filesystem. That said, you can turn most of the file checks off, but then you need to clear cache manually if there's changes in your content.

Rarst commented 7 years ago

It seems file_exists() is known to be quite a bit slower on Windows comparing to nixes, see http://stackoverflow.com/a/7767723/886380

I was able to squeeze out about 30% speed up by increasing realpath_cache_size in PHP config, but no more than that. Also note that PHP doesn't cache misses on these and from poking through queries some locations seem to be looked up repeatedly.

In a nutshell it seems that current architecture is disk–intensive and relies on favorable filesystem implementation. Something to consider for scaling.

mahagr commented 7 years ago

I have some caching on file_exists() calls, but I didn't think I need to do much work there because of it was so fast for me. Its something that can surely be improved rather easily now that I know that its an issue in Windows. Before it just wasn't worth it.

The reason why you see so many calls to file_exists() is in the streams we use to abstract the filesystem. Streams are like a virtual drive, which can have both overrides and custom locations for individual files or paths. The feature is really useful on many things as you don't need to know where the files are located and making it really easy to customize almost whatever you want to.

mahagr commented 7 years ago

Miss-click there.

After reading the link you gave.. Have you noticed that there's a few options in Grav that attempt to minimize amount of file checks?

Rarst commented 7 years ago

Yes, I did experiment with configuration, but hadn't observed anything to improve this specific aspect. It seems that many of calls originate in boot process and asset–related stuff, which aren’t significantly affected by caching settings.

mahagr commented 7 years ago

Try following settings in user/config/system.yaml:

cache:
  enabled: true
  check:
    method: none

twig:
  debug: false
  auto_reload: false

This will turn off a lot of file exists/modification tests after cache has been primed up. Just note that those features are really annoying if you are actually developing the site as you need to clear cache on every change you make to the files.

Rarst commented 7 years ago

I did, as above — doesn’t make a significant difference to the issue.

mahagr commented 7 years ago

Similar issue in #1239

OleVik commented 7 years ago

Luckily most of this is only obvious in development, as common live production-servers don't seem to experience much of this at all - since they meet many of the suggested performance-recommendations. Has anyone done testing on Windows with juggling of the caching-options? Also a comparative look at Grav with and without the performance-recommendations would be interesting.

It's been discussed several times, and thousands of pages is still out-of-scope for Grav, but I think the whole idea of such scaling can be put aside until a partial and differential caching-mechanism is in place. Thousands of files is always a hassle to read compared to a lightweight database, and even squeezing every last drop out of the system won't solve the structural issues behind the approach.

As @rhukster wrote last summer, changes are planned to deal with this, though I'm sure some extensive, comparative testing on our part wouldn't go amiss.