facebook / hhvm

A virtual machine for executing programs written in Hack.
https://hhvm.com
Other
18.18k stars 2.99k forks source link

stat(), is_dir(), realpath() have absolutely terrible performance #4431

Closed fredemmott closed 9 years ago

fredemmott commented 9 years ago
<?php

function main() {
  $start = microtime(true);
  for($i = 0; $i < 100000; ++$i) {
    is_dir(__FILE__);
    realpath(__FILE__);
    stat(__FILE__);
  }
  $end = microtime(true);
  var_dump($end - $start);
}

main();
[fredemmott@devbig076 ~] php5 test.php
float(0.24126386642456)
[fredemmott@devbig076 ~] phpng test.php
float(0.17168688774109)
[fredemmott@devbig076 ~] hhvm -v Eval.Jit=1 test.php
float(3.3344538211823)
[fredemmott@devbig076 ~] hhvm -v Eval.Jit=0 test.php
float(4.2481589317322)

Based on xhprof and 'perf', this is a major issue for Drupal7, Magento, and SugarCRM.

I'll be working on this. The user functions do not use statcache, but even if I forcibly change that, performance does not improve.

fredemmott commented 9 years ago

PHP5 has a cache for user calls that requires manual clearing (http://php.net/manual/en/function.clearstatcache.php). We have no cache as yet.

fredemmott commented 9 years ago

Based on a naive (uncommitable) implementation, this looks like it could be a 2-4x RPS improvement.

staabm commented 9 years ago

Rps?

simonwelsh commented 9 years ago

Requests per second

jazzdan commented 9 years ago

Is this rps improvement only realized when non-RepoAuth mode? Or does it apply regardless? We've recently implemented a realpath() cache to solve perf problems we were seeing in non-RepoAuth mode. After implementing we saw a similar increase in throughput.

We're planning on submitting it as a PR soon if you want to build off of that. Otherwise, happy to throw it out and replace it with something that caches stat() and is_dir() as well. :)

fredemmott commented 9 years ago

Test for is_dir() cache: https://gist.github.com/fredemmott/0914e077e12ea5a5b753 Unoptimized-but-correct implementation for is_dir() cache: https://gist.github.com/fredemmott/25bd4961567d922a8339

fredemmott commented 9 years ago

is_dir() cache: https://reviews.facebook.net/D30351

The huge performance win is only on systems that are running auditd - otherwise, it seems to vary by system between a 10% win and a 2% loss.

fredemmott commented 9 years ago

This approach needs #1669

fredemmott commented 9 years ago

largely fixed by https://github.com/facebook/hhvm/commit/0cc2c678fc0cd13443b8117d92eb2a6b438e3692 - cache overhead now seems like a bad tradeoff.

fredemmott commented 9 years ago

Well, this is now comparable, as is the drupal benchmark:

<?php
$start = microtime(true);
for ($i = 0; $i < 1000000; ++$i ) {
  clearstatcache();
 is_dir('/tmp');
}
$end = microtime(true);
var_dump($end - $start);

The original script still isn't, however as it no longer seems to be illustrative of a practical problem, it's not really important to us. Will re-open if new evidence shows that to be false.

joshuataylor commented 9 years ago

Just wanted to make a note anyone stumbling across this issue and is using Drupal.

If you're doing profiling with Drupal, and seeing thousands of dir finds with PHP, it's because there is probably a missing module somewhere.

You can use drush to find and remove missing modules: `drush dl clean_missing_modules

drush clean-modules`

This should remove those thousands of calls.

alex-moreno commented 8 years ago

this Drupal problem that you mention @joshuataylor should be mitigated by this patch: https://www.drupal.org/node/1081266?page=1

Just for the sake of information. We had some performance issues recently that were solved or mitigated by this.