Open quicksketch opened 9 years ago
Yea performance! On Feb 13, 2015 9:42 PM, "Nate Haug" notifications@github.com wrote:
We've already optimized Backdrop to the point that a database connection is not necessary if you're using an alternative cache-backend like Memcache. Even though the database is unnecessary except for the single query to get the cached page, we still need that database connection just for that single query.
I was experimenting with using a flat-file cache as an alternative, a basic port of https://www.drupal.org/project/filecache. The results were fairly promising running ab -c 100 -n 100 http://backdrop.local/:
Database cache:
Requests per second: 569.45 #/sec Time per request: 175.608 ms Time per request: 1.756 [ms](mean, across all concurrent requests) Transfer rate: 12078.02 [Kbytes/sec] received
File cache:
Requests per second: 741.53 #/sec Time per request: 134.857 ms Time per request: 1.349 [ms](mean, across all concurrent requests) Transfer rate: 15727.74 [Kbytes/sec] received
So roughly a 30% performance increase in cached page delivery. The downside is that the disk becomes littered with non-sense cache files. But as far as disk-space goes, this is no different from the page cache, which takes up just as much room, but it's hidden away in the database.
If we can test this approach on various servers and consistently get a similar increase, I think we might consider bundling a flat-file cache and using it by default for the page cache.
— Reply to this email directly or view it on GitHub https://github.com/backdrop/backdrop-issues/issues/716.
this sounds very cool; I'd be glad to test locally; our on my linode;or both
I just deleted my entire sandbox for the second time (file_unmanaged_delete_recursive()
is a dangerous tool). I'm not up for porting this a 3rd time tonight. So I'm afraid further benchmarks are going to have to wait.
oh no! wait it is;
file_unmanaged_delete_recursive()
is a dangerous tool
Oh the memories this brought back!! ...back in the 1990s when we were learning DOS loaded from floppies, our teacher told us "NEVER-EVER use format c:
!!!" :smile:
Thanks klonos, that makes me :smile:
It's been a little bit of a rough night in programming land. :stuck_out_tongue_winking_eye:
If we use two servers, we have to cache two version? On cache flushing, how we make sure cache are flushed on all servers? Thanks
@andytruong in the event that you had two servers, you probably already need to share the files directory between the two of them (usually over an NFS mount), so the cache would still be shared between the servers just like any other file in the public files directory. However, in such a situation, you could switch the cache back to the database, or use another central storage like memcache. Lastly, if you had two servers, you'd probably also have Varnish or nginx in front of them distributing traffic between them, in which case the Backdrop page cache becomes inconsequential.
Overall, I think this would result in some situations (single-server) being faster than before but impacting other situations (multi-server). It would make simple sites faster, while those sites with more complicated architectures may need to make adjustments to optimize. Mutiserver environments already require a lot of additional planning, so this may be worth the tradeoff.
...so this may be worth the tradeoff.
Until we get the metrics gathering implemented we can not be sure, but Backdrop is aiming for low-end market. These use cases usually have a single server. So if the speed gain is considerable, I say we go ahead and implement this as the default. There can be a toggle in the advanced settings during setup to allow switching to database from day 1 and/or a toggle in /admin/config/development/performance
to accommodate for use cases when people implement multiple servers later on in the site life cycle.
...at the very least and if setting as default is dangerous, we could simply add the toggle in /admin/config/development/performance
and have a warning text about this feature not being suitable for multi-server setups.
@quicksketch can you explain what do you mean by flat-file?
I probably can jump on this task, if you explain it.
@Gormartsen so my intention here had been to implement the full extent of a cache backend that wrote to files in the public files directory.
files/cache/$bin/$salted_sha1
. The contents of this file would have to include not only the cached content, but also the created and expiration time of the cache entry.Things get a little more complicated when dealing with cache clears however. Due to the requirements of clearing data by cache key prefix and possibly in the future by cache tags, we need to have a record of cache entries by their non-hashed file names. So in addition:
So this would result in slower writes and cache clears, but really fast reads. For the page cache, it would be particularly well suited, because cache hits would no longer require a database connection at all. It would not be well-suited to situations where cache writes are frequent, as it would generally be slower than a database cache because you're writing in two locations.
So it's pretty close to what you were proposing in https://github.com/backdrop/backdrop-issues/issues/1413, but it would be a standard cache-backend that could be used for any caching.
@quicksketch Please correct me, if I am missing anything here.
We need FileCache implementation based on BackdropCacheInterface
I can use BackdropDatabaseCache as example of implementation.
We need next features:
BackdropCacheInterface::deletePrefix($prefix)
Please also let me know a reason to do so, I never had a need to clean cache by cache key prefix. If I understand a reason, I can find best implementation to do so.Note: Keeping cache in files has unique features compare to database:
Performance. Is there particular reason to use SHA1 instead of MD5 ?
MD5 a little bit faster that SHA1. See next code and results:
<?php
echo 'Building random data ...' . PHP_EOL;
$data = '';
for ($i = 0; $i < 64000; $i++) {
$data .= hash('md5', rand(), true);
}
$time = microtime(true);
sha1($data);
$time = microtime(true) - $time;
$results[$time * 1000000000][] = "sha1";
$time = microtime(true);
md5($data);
$time = microtime(true) - $time;
$results[$time * 1000000000][] = "md5";
ksort($results);
echo PHP_EOL . PHP_EOL . 'Results: ' . PHP_EOL;
$i = 1;
foreach ($results as $k => $v)
foreach ($v as $k1 => $v1)
echo ' ' . str_pad($i++ . '.', 4, ' ', STR_PAD_LEFT) . ' ' . str_pad($v1, 30, ' ') . ($k / 1000) . ' microseconds' . PHP_EOL;
Results:
1. md5 2204.895 microseconds
2. sha1 2443.075 microseconds
So it's pretty close to what you were proposing in #1413, but it would be a standard cache-backend that could be used for any caching.
Yes, I decided to write general cache backend interface and make it possible to select. See #1434
We need FileCache implementation based on BackdropCacheInterface I can use BackdropDatabaseCache as example of implementation.
Yep! That's the basic idea.
clean cached data by time expire per KEY or keep it as PERMANENT if required.
This in particular is a required feature and the primary reason why using the database may still be necessary.
I never had a need to clean cache by cache key prefix. If I understand a reason, I can find best implementation to do so.
The reason for this is to be able to clear all caches related to a particular module without using a separate cache bin. For example if a module did several cache sets like this:
cache_set('my_module:foo:1', $data);
cache_set('my_module:foo:2', $data);
cache_set('my_module:foo:3', $data);
cache_set('my_module:bar:1', $data);
Then the cache entries within the "foo" group could be cleared with:
cache_clear_all('my_module:foo:', 'cache', TRUE);
// Or:
cache()->deletePrefix('my_module:foo:');
Deleting by cache prefix is not very common. Usually we set up dedicated cache bins these days. But the ability to have queryable cache entries is still required for setting and finding expiration times. In the future if we implement something similar to D8's cache tags (which I think is a good idea), then that functionality would replace the deletePrefix()
approach.
Is there particular reason to use SHA1 instead of MD5 ?
Mostly because SHA1 has a larger pool and lowers any chance of a key conflict. We'll only be calling SHA1 on cache set and get, which will likely only be a few times (if using for the page cache). Performance wise, in your test the difference between a single SHA1 and an MD5 is 0.00375ms. That shouldn't be an issue. I also have a small concern that cache entries may contain sensitive data, so that's the reason for salting with something like the site private key so that if left unprotected, these file names would not be easily guessable.
Note: Keeping cache in files has unique features compare to database:
That may be true, but as with storing in APC, Redis, Memcache, or any other cache backend, they each have unique features. In the case of a generic cache backend, we have to be able to implement the same features across all of them.
That may be true, but as with storing in APC, Redis, Memcache, or any other cache backend, they each have unique features. In the case of a generic cache backend, we have to be able to implement the same features across all of them.
I agree. What I am trying to say is that TAG ing could be done with out Database use at all, simply by creating public://filecache/TAGNAME folder and symlink keys from public://filecache/BIN to TAG folder.
Then if we need to clean all keys by TAG, we just simply read TAGNAME folder and remove files from BIN folder and all symlinks from TAG folder.
Same Idea could be done for time expiration. We can create a dir public://filecache/EXPIRE/timestamp , symlink there keys from BIN and in garbageCollection() get all directories with timestamp less than REQUEST_TIME and clean files by symlinks.
I'm not keen on using symlinks personally. They can be tricky to manage for both developers and site administrators. In D8-land, cache entries for pages have around 50 tags per page (a unique combination of language, each node, comment, user, term, etc. shown on the entire page). If we took the same approach as D8, we'd end up with a symlink (cache tag) for every entity on the entire site. That'd make for a lot of symlinks.
They're also not universally supported across difference file systems (e.g. fat32). In any case, for the time being we don't have tags, we only have cache prefixes.
I see you point. I will think about it.
We've already optimized Backdrop to the point that a database connection is not necessary if you're using an alternative cache-backend like Memcache. Even though the database is unnecessary except for the single query to get the cached page, we still need that database connection just for that single query.
I was experimenting with using a flat-file cache as an alternative, a basic port of https://www.drupal.org/project/filecache. The results were fairly promising running
ab -c 100 -n 100 http://backdrop.local/
:Database cache:
File cache:
So roughly a 30% performance increase in cached page delivery. The downside is that the disk becomes littered with non-sense cache files. But as far as disk-space goes, this is no different from the page cache, which takes up just as much room, but it's hidden away in the database.
If we can test this approach on various servers and consistently get a similar increase, I think we might consider bundling a flat-file cache and using it by default for the page cache.