caxy / php-htmldiff

A library for comparing two HTML files/snippets and highlighting the differences using simple HTML. Includes support for comparing complex lists and tables
http://php-htmldiff.caxy.com
GNU General Public License v2.0
202 stars 51 forks source link

Major decrease in speed after upgrading from v0.1.5 to v0.1.7 #77

Closed waltertamboer closed 5 years ago

waltertamboer commented 6 years ago

We have an application where we generate a comparison between two files. This file is pretty big and on v0.1.5 this took 21 seconds. After upgrading to v0.1.7 the comparison took 26 minutes.

After looking at https://github.com/caxy/php-htmldiff/compare/v0.1.5...v0.1.7 - I suspect the multibyte functions to be the cause of this but I did not dug any further into this issue. There probably is some inefficient part in the code that causes this insane increase of time but I have not enough domain knowledge to pinpoint what's going on.

I'm more than happy to help out on this. Just let me know what I can do.

SavageTiger commented 6 years ago

Hi Walter,

I am pretty sure you are right that the multibyte support is causing a performance penalty. This was also noted when accepting the pull request, I think this is a trade of that has to be made.

My advice for now, make sure you are using atleast php7, maybe consider doing the diffing in a CLI or as a microservice.

waltertamboer commented 6 years ago

Thanks for your advise Sven. We are indeed on PHP 7 and the diff is generated in the background via a Beanstalkd job. The problem is that it's unacceptable for our customers to wait this long so I'd love to find a solution for this.

jschroed91 commented 6 years ago

@SavageTiger @waltertamboer Do we think adding in a config option to disable multibyte support is something worth doing?

I think that would be an option here that is something we could do immediately, but not the ideal solution - ideal solution would be to make some of the performance improvements that require larger-scale changes... those would take some time

etaunknown commented 5 years ago

Some users with timeout issues reverted back the the 0.1.5 with significant performance increases with the following hack:

protected function isOnlyWhitespace($str) { // Slightly faster then using preg_match // return $str !== '' && (mb_strlen(trim($str)) === 0); return $str !== '' && (strlen(trim($str)) === 0); }

Some profiling info that pin-pointed the suspected issue

https://www.drupal.org/project/diff/issues/3030979#comment-12964298

SavageTiger commented 5 years ago

See PR: https://github.com/caxy/php-htmldiff/pull/81

jschroed91 commented 5 years ago

Fixes from #81 are included in new release v0.1.9

waltertamboer commented 5 years ago

Thanks! I'll try to test how these changes perform in our application asap.

SavageTiger commented 5 years ago

Josh packagist still shows 0.1.8 as the latest version. Not sure why

jschroed91 commented 5 years ago

Hm, the connector must be outdated. Will update!

jschroed91 commented 5 years ago

@SavageTiger New release is now updated on packagist