caxy / php-htmldiff

A library for comparing two HTML files/snippets and highlighting the differences using simple HTML. Includes support for comparing complex lists and tables
http://php-htmldiff.caxy.com
GNU General Public License v2.0
202 stars 51 forks source link

Latest release can cause segmentation faults #74

Closed danepowell closed 6 years ago

danepowell commented 6 years ago

A project I work on recently upgraded its dependency on caxy/php-htmldiff from 0.1.6 to 0.1.7. After the update, we are seeing segmentation faults when calling php-htmldiff. The fault actually occurs in the c standard library, not in the php binary.

The only change in the 0.1.7 release is this PR: https://github.com/caxy/php-htmldiff/pull/72

This might be a little hard to reproduce, because I'm actually not sure how php-htmldiff is being invoked. It's used by a dependency of a dependency of a dependency of a dependency on our project :smile:

But I'd still appreciate any input you might have on what could be causing segfaults with this change. Most likely a problem with the mbstring extension I'd imagine.

jschroed91 commented 6 years ago

Hmm... very interesting @danepowell.

Is it happening every time? Are you able to share examples of the HTML inputs that are going into htmldiff? Or is it happening for any input?

SavageTiger commented 6 years ago

Also it might be good to know what version of PHP this is happening on.

danepowell commented 6 years ago

It happens every time I run a certain command. Like I mentioned, it's hard to know what input is being passed to the library since I'm not calling it directly, but I'll try to find out.

I've reproduced it on PHP 7.1 and 7.2. Haven't tried other versions.

My hunch is that this is a out of memory error. I think I've seen this happen before with excessively long strings being passed to mbstring. But i really can't know for sure.

danepowell commented 6 years ago

I've narrowed it down quite a bit.

This is the line generating the segfault: https://github.com/caxy/php-htmldiff/blob/48c70a963e803b93fe68a191e62d0770b5446f0b/lib/Caxy/HtmlDiff/AbstractDiff.php#L598

The latest release works fine if I simply change that line back to mimic the old version: return ctype_alnum($str);

I'm still having trouble figuring out if or where that line is being called. Even if I put print('hello world'); or exit; statements in there, nothing seems to happen. It's like the line isn't being called at all, and yet it's generating a segfault. It's also possible the output is getting swallowed by SSH/TTY or something.

Edit: now it's segfaulting even if I have the old line in there, so it's possible I'm chasing a red herring here. I'll see what else I can find. Sorry for the noise...

SavageTiger commented 6 years ago

Maybe this is relevant: https://bugs.php.net/bug.php?id=65009&edit=1

If this is the case, best solution is to reimplement this method in another hopefully also unicode compatible way.

danepowell commented 6 years ago

I've refined my theory about what's happening here: I think this is actually a bug in xhprof that's being triggered by php-htmldiff. I'm in pretty well over my head on this though; can you review the evidence and let me know if you agree, and then I can try to open an upstream ticket with xhprof?

Here is a core dump and backtrace that I managed to pull: https://pastebin.com/4iVHND1G

The germane / proximal lines:

0 __memcpy_sse2_unaligned () at ../sysdeps/x86_64/multiarch/memcpy-sse2-unaligned.S:38

38 ../sysdeps/x86_64/multiarch/memcpy-sse2-unaligned.S: No such file or directory. (gdb) bt

0 __memcpy_sse2_unaligned () at ../sysdeps/x86_64/multiarch/memcpy-sse2-unaligned.S:38

1 0x00007f04b302b743 in memcpy (len=39, src=, __dest=0x7f04b0ae8358) at /usr/include/x86_64-linux-gnu/bits/string3.h:53

2 zend_string_init (persistent=0, len=, str=) at /usr/include/php/20160303/Zend/zend_string.h:160

3 hp_compile_file (file_handle=0x7ffc213c11a0, type=8) at /beetbox/workspace/7.1/xhprof-php7/extension/xhprof.c:1594

It appears that xhprof performs some sort of compilation or hooks into the opcode compilation. My theory is that xhprof is causing this segfault when it tries to read/compile that preg_match line in php-htmldiff. That would explain why php-htmldiff is involved despite apparently never being called.

To verify this, I tried repeatedly enabling/disabling the xhprof extension. Whenever xhprof is enabled, PHP segfaults. Whenever xhprof is disabled, no segfaults. So xhprof definitely seems involved (unless this is a memory problem or something more exotic).

SavageTiger commented 6 years ago

As far as I can tell the segterm happens while memory is allocated, and I imagon you came to the same conclusion. Not sure if this gdb dump has any value to us or the xhprof devs.

Best thing might be to try and create a standalone php file with a huge string in it and apply the same preg_match to it. If you get a segfault, you can supply that file as a reproducible situation to the xhprof maintainers.

danepowell commented 6 years ago

Whatever this is, it seems to not be a problem with this library so I can close this issue and declutter your queue. If you have any other insight I'd love to hear it though.

I'm 99% sure php-htmldiff is not even being called when the segfault occurs so I doubt this has anything to do with input to preg_match. I think this library just happened to trigger some sort of memory bug elsewhere in the stack. I opened an xhprof issue just in case that's the culprit: https://github.com/phacility/xhprof/issues/102

danepowell commented 6 years ago

Just closing the loop on this... I still have no idea what caused the segfaults, but a colleague started experiencing them even with the older version of caxy/php-htmldiff, so I'm pretty sure this library is completely innocent :)