facebook / hhvm

A virtual machine for executing programs written in Hack.
https://hhvm.com
Other
18.16k stars 2.99k forks source link

Segfault in preg_replace_callback() #4767

Closed matzekuh closed 8 months ago

matzekuh commented 9 years ago
Host: ####
ProcessID: 900
ThreadID: 7fec863ff700
ThreadPID: 908
Name: unknown program
Type: Segmentation fault
Runtime: hhvm
Version: tags/HHVM-3.5.0-0-ga90f4733cfa0d8fefdafc7433f758f78cdc75424
DebuggerCount: 0

ThreadType: Web Request
Server_SERVER_NAME: ####
Server: ####
URL: /wiki/doku.php?id=wiki:syntax&do=debug

PHP Stacktrace:

#0  GeSHi->parse_non_string_part( &lt;<|UR1|"http://december<DOT>com/html/4/element/span<DOT>html"><|/2/>span|></a> style=) called at [/var/www/service/wiki/inc/geshi.php:2568]
#1  GeSHi->parse_code() called at [/var/www/service/wiki/inc/parserutils.php:726]
#2  p_xhtml_cached_geshi(This is some <span style="color:red;font-size:150%;">inline HTML</span>, html4strict, code) called at [/var/www/service/wiki/inc/parser/xhtml.php:543]
#3  Doku_Renderer_xhtml->html(
This is some <span style="color:red;font-size:150%;">inline HTML</span>
) called at [/var/www/service/wiki/inc/parserutils.php:607]
#4  p_render(xhtml, Array, ) called at [/var/www/service/wiki/inc/parserutils.php:113]
#5  p_cached_output(/var/www/service/wiki/data/pages/wiki/syntax.txt, xhtml, wiki:syntax) called at [/var/www/service/wiki/inc/parserutils.php:76]
#6  p_wiki_xhtml(wiki:syntax, 0, 1, ) called at [/var/www/service/wiki/inc/html.php:246]
#7  html_show() called at [/var/www/service/wiki/inc/template.php:105]
#8  tpl_content_core() called at [/var/www/service/wiki/inc/events.php:108]
#9  Doku_Event->trigger(tpl_content_core, 1) called at [/var/www/service/wiki/inc/events.php:231]
#10 trigger_event(TPL_ACT_RENDER, show, tpl_content_core) called at [/var/www/service/wiki/inc/template.php:82]
#11 tpl_content() called at [/var/www/service/wiki/lib/tpl/dokuwiki/main.php:59]
#12 include(/var/www/service/wiki/lib/tpl/dokuwiki/main.php) called at [/var/www/service/wiki/inc/actions.php:206]
#13 act_dispatch() called at [/var/www/service/wiki/doku.php:119]
paulbiss commented 9 years ago

There isn't a lot for us to go on here. The C++ stacktrace seems to be absent. If you can get us a stacktrace from a debug build that would be great, if you can isolate a small piece of php that will cause this crash that would be even better. Thanks!

matzekuh commented 9 years ago

I installed hhvm-nightly-dbg. The problem only appears when loading the dokuwiki syntax page (doku.php?id=wiki:syntax). I will try to figure out what exactly happens before hhvm crashes.

Stacktrace:

Host: ####
ProcessID: 30201
ThreadID: 7f525f7ff700
ThreadPID: 30216
Name: unknown program
Type: Segmentation fault
Runtime: hhvm
Version: heads/master-0-g163b2627434f48963ba334d90f668ffbc96e067e
DebuggerCount: 0

Server: ####
ThreadType: Web Request
Server_SERVER_NAME: ####
URL: /wiki/doku.php?id=wiki:syntax

# 0  bt_handler at /tmp/tmp.58XxoBHyUx/hphp/runtime/base/crash-reporter.cpp:71

doku.php:

<?php
/**
 * DokuWiki mainscript
 *
 * @license    GPL 2 (http://www.gnu.org/licenses/gpl.html)
 * @author     Andreas Gohr <andi@splitbrain.org>
 *
 * @global Input $INPUT
 */

// update message version
$updateVersion = 46.2;

//  xdebug_start_profiling();

if(!defined('DOKU_INC')) define('DOKU_INC', dirname(__FILE__).'/');

if(isset($_SERVER['HTTP_X_DOKUWIKI_DO'])) {
    $ACT = trim(strtolower($_SERVER['HTTP_X_DOKUWIKI_DO']));
} elseif(!empty($_REQUEST['idx'])) {
    $ACT = 'index';
} elseif(isset($_REQUEST['do'])) {
    $ACT = $_REQUEST['do'];
} else {
    $ACT = 'show';
}

// load and initialize the core system
require_once(DOKU_INC.'inc/init.php');

//import variables
$INPUT->set('id', str_replace("\xC2\xAD", '', $INPUT->str('id'))); //soft-hyphen
$QUERY          = trim($INPUT->str('id'));
$ID             = getID();

$REV   = $INPUT->int('rev');
$DATE_AT = $INPUT->str('at');
$IDX   = $INPUT->str('idx');
$DATE  = $INPUT->int('date');
$RANGE = $INPUT->str('range');
$HIGH  = $INPUT->param('s');
if(empty($HIGH)) $HIGH = getGoogleQuery();

if($INPUT->post->has('wikitext')) {
    $TEXT = cleanText($INPUT->post->str('wikitext'));
}
$PRE = cleanText(substr($INPUT->post->str('prefix'), 0, -1));
$SUF = cleanText($INPUT->post->str('suffix'));
$SUM = $INPUT->post->str('summary');

//parse DATE_AT
if($DATE_AT) {
    $date_parse = strtotime($DATE_AT);
    if($date_parse) {
        $DATE_AT = $date_parse;
    } else { // check for UNIX Timestamp
        $date_parse = @date('Ymd',$DATE_AT);
        if(!$date_parse || $date_parse === '19700101') {
            msg(sprintf($lang['unable_to_parse_date'], $DATE_AT));
            $DATE_AT = null;
        }
    }
}

//check for existing $REV related to $DATE_AT
if($DATE_AT) {
    $pagelog = new PageChangeLog($ID);
    $rev_t = $pagelog->getLastRevisionAt($DATE_AT);
    if($rev_t === '') { //current revision
        $REV = null;
        $DATE_AT = null;
    } else if ($rev_t === false) { //page did not exist
        $rev_n = $pagelog->getRelativeRevision($DATE_AT,+1);
        msg(sprintf($lang['page_nonexist_rev'],
            strftime($conf['dformat'],$DATE_AT),
            wl($ID, array('rev' => $rev_n)),
            strftime($conf['dformat'],$rev_n)));
        $REV = $DATE_AT; //will result in a page not exists message
    } else {
        $REV = $rev_t;
    }
}

//make infos about the selected page available
$INFO = pageinfo();

//export minimal info to JS, plugins can add more
$JSINFO['id']        = $ID;
$JSINFO['namespace'] = (string) $INFO['namespace'];

// handle debugging
if($conf['allowdebug'] && $ACT == 'debug') {
    html_debug();
    exit;
}

//send 404 for missing pages if configured or ID has special meaning to bots
if(!$INFO['exists'] &&
    ($conf['send404'] || preg_match('/^(robots\.txt|sitemap\.xml(\.gz)?|favicon\.ico|crossdomain\.xml)$/', $ID)) &&
    ($ACT == 'show' || (!is_array($ACT) && substr($ACT, 0, 7) == 'export_'))
) {
    header('HTTP/1.0 404 Not Found');
}

//prepare breadcrumbs (initialize a static var)
if($conf['breadcrumbs']) breadcrumbs();

// check upstream
checkUpdateMessages();

$tmp = array(); // No event data
trigger_event('DOKUWIKI_STARTED', $tmp);

//close session
session_write_close();

//do the work (picks up what to do from global env)
act_dispatch();

$tmp = array(); // No event data
trigger_event('DOKUWIKI_DONE', $tmp);

//  xdebug_dump_function_profile(1);

?>
matzekuh commented 9 years ago

Dokuwiki input that causes hhvm to crash:

<HTML>
This is some <span style="color:red;font-size:150%;">inline HTML</span>
</HTML>

Information about dokuwiki syntax: https://www.dokuwiki.org/wiki:syntax#embedding_html_and_php The output should be a code block showing the html code between the tags with highlighting, as HTML/PHP-Embedding is not allowed in the configuration. The same code in <html></html>-tags is also causing hhvm to crash.

Stacktrace:

Host: ####
ProcessID: 33427
ThreadID: 7fec87fff700
ThreadPID: 33432
Name: unknown program
Type: Segmentation fault
Runtime: hhvm
Version: heads/master-0-g163b2627434f48963ba334d90f668ffbc96e067e
DebuggerCount: 0

Server: ####
ThreadType: Web Request
Server_SERVER_NAME: ####
URL: /test/dokuwiki/doku.php?id=start

# 0  bt_handler at /tmp/tmp.58XxoBHyUx/hphp/runtime/base/crash-reporter.cpp:71

PHP Stacktrace:

#0  GeSHi->parse_non_string_part( &lt;<|UR1|"http://december<DOT>com/html/4/element/span<DOT>html"><|/2/>span|></a> style=) called at [/var/www/service/test/dokuwiki/inc/geshi.php:2568]
#1  GeSHi->parse_code() called at [/var/www/service/test/dokuwiki/inc/parserutils.php:726]
#2  p_xhtml_cached_geshi(This is some <span style="color:red;font-size:150%;">inline HTML</span>, html4strict, pre) called at [/var/www/service/test/dokuwiki/inc/parser/xhtml.php:543]
#3  Doku_Renderer_xhtml->html(
This is some <span style="color:red;font-size:150%;">inline HTML</span>
, pre) called at [/var/www/service/test/dokuwiki/inc/parser/xhtml.php:555]
#4  Doku_Renderer_xhtml->htmlblock(
This is some <span style="color:red;font-size:150%;">inline HTML</span>
) called at [/var/www/service/test/dokuwiki/inc/parserutils.php:607]
#5  p_render(xhtml, Array, ) called at [/var/www/service/test/dokuwiki/inc/parserutils.php:113]
#6  p_cached_output(/var/www/service/test/dokuwiki/data/pages/start.txt, xhtml, start) called at [/var/www/service/test/dokuwiki/inc/parserutils.php:76]
#7  p_wiki_xhtml(start, 0, 1, ) called at [/var/www/service/test/dokuwiki/inc/html.php:246]
#8  html_show() called at [/var/www/service/test/dokuwiki/inc/template.php:105]
#9  tpl_content_core() called at [/var/www/service/test/dokuwiki/inc/events.php:108]
#10 Doku_Event->trigger(tpl_content_core, 1) called at [/var/www/service/test/dokuwiki/inc/events.php:231]
#11 trigger_event(TPL_ACT_RENDER, show, tpl_content_core) called at [/var/www/service/test/dokuwiki/inc/template.php:82]
#12 tpl_content() called at [/var/www/service/test/dokuwiki/lib/tpl/dokuwiki/main.php:59]
#13 include(/var/www/service/test/dokuwiki/lib/tpl/dokuwiki/main.php) called at [/var/www/service/test/dokuwiki/inc/actions.php:206]
#14 act_dispatch() called at [/var/www/service/test/dokuwiki/doku.php:119]

I'm almost sure, that there are more syntax snippets that cause hhvm to crash but i didn't manage to find all of them yet.

matzekuh commented 9 years ago

Further testing brought up, that the error is not caused by the <html></html> respectively <HTML></HTML> tags but by their content. For example the following code does not cause a crash:

<html>
<p style="border:2px dashed red;">And this is some block HTML</p>
</html>
<HTML>
<p style="border:2px dashed red;">And this is some block HTML</p>
</HTML>

EDIT: I tried the <html></html>-block with different html tags. It looks like the <span>-tag is the only tag that causes an error.

paulbiss commented 9 years ago

I suspect this may be related to #4108 but again, I don't really have a way of testing this. Generally when we're looking for an isolated example of a crash we're hoping for something short and in a single file that won't involve running an entire framework.

fredemmott commented 9 years ago

Is that the full C++ backtrace? There should be more lines than bt_handler.

matzekuh commented 9 years ago

@fredemmott Actually it is the full backtrace. I tried several times, the output did not change.

ghost commented 9 years ago

Is there any status update on this issue? I just ran into the exact same problem with hhvm 3.9.1 and dokuwiki 2015-08-10.

keepxtreme commented 9 years ago

I spent some time investigating further. HHVM crashes on following part of geshi->parse_non_string_part();

foreach (array_keys($this->language_data['KEYWORDS']) as $k) {
// {...}
                //NEW in 1.0.8, the cached regexp list
                // since we don't want PHP / PCRE to crash due to too large patterns we split them into smaller chunks
                for ($set = 0, $set_length = count($this->language_data['CACHED_KEYWORD_LISTS'][$k]); $set <  $set_length; ++$set) {
                    $keywordset =& $this->language_data['CACHED_KEYWORD_LISTS'][$k][$set];
                    // Might make a more unique string for putting the number in soon
                    // Basically, we don't put the styles in yet because then the styles themselves will
                    // get highlighted if the language has a CSS keyword in it (like CSS, for example ;))
                    $stuff_to_parse = preg_replace_callback(
                        "/$disallowed_before_local({$keywordset})(?!\<DOT\>(?:htm|php|aspx?))$disallowed_after_local/$modifiers",
                        array($this, 'handle_keyword_replace'),
                        $stuff_to_parse
                        );
                }
}

If I comment out $stuff_to_parse = preg_replace_callback(...) everything runs smoothly.

edit:// ok, seems like HHVM runs in a recursion or something like that:

If I'm using the html4strict-profile of GeSHi (which is causing the problem in case of DokuWiki) and parse a string $string="string" then the HTML-Ouput generated by this function will be:

<pre class="html4strict" style="font-family:monospace;">string</pre>

For any other HTML tag:

<pre class="html4strict" style="font-family:monospace;"><span style="color: #009900;">&lt;<span style="color: #66cc66;">/</span><a href="http://december.com/html/4/element/table.html"><span style="color: #000000; font-weight: bold;">table</span></a>&gt;</span></pre>

If I'm try to parse "<span>", "<pre>" or "</span>", HHVM will crash with segfault (thus "</pre>" does work and I don't get, why).

If I'm now commenting the preg_replace_callback() call the result for parsing "</span>" is:

<pre class="html4strict" style="font-family:monospace;"><span style="color: #009900;">&lt;<span style="color: #66cc66;">/</span>span&gt;</span></pre>

This seems somehow logical as the keywords "span" & "pre" are defined by the $language_data: https://github.com/GeSHi/geshi-1.0/blob/master/src/geshi/html4strict.php

Hope this is helpful...

JoelMarcey commented 9 years ago

Just a random thought -- is /e included in the $modifiers to the call to epreg_replace_callback()?

keepxtreme commented 9 years ago
'CASE_SENSITIVE' => array(
        GESHI_COMMENTS => false,
        2 => false,
        3 => false,
        ),
$case_sensitive = $this->language_data['CASE_SENSITIVE'][$k];
$modifiers = $case_sensitive ? '' : 'i';
echo $modifiers."\n";

Results in (caused by the loop): i i

Also tried to change $modifiers="" but still seg fault. edit:// it's definitly related to $keywordset, if I set it to "" it does work. ``` PHP $keywordset="a(?:bbr|cronym|ddress|pplet|rea)?|b(?:ase(?:font)?|do|ig|lockquote|ody|r|utton)?|c(?:aption|enter|ite|o(?:de|l(?:group)?))|d(?:d|el|fn|i[rv]|l|t)|em|f(?:ieldset|o(?:nt|rm)|rame(?:set)?)|h(?:1|2|3|4|5|6|ead|r|tml)|i(?:frame|layer|mg|n(?:put|s)|sindex)?|kbd|l(?:abel|egend|i(?:nk)?)|m(?:ap|eta)|no(?:frames|script)|o(?:bject|l|pt(?:group|ion))|p(?:aram|re)?|q|s(?:amp|cript|elect|mall|pan|t(?:r(?:ike|ong)|yle)|u[bp])?|t(?:able|body|d|ext(?:area)?|foot|h(?:ead)?|itle|r|t)|ul?|var"; ``` ok seems somehow depinding on the lenght of $keywordset. If I delete chars at the beginning OR the end at one moment it starts working. Also, if I replace s(...) by o(...) it does also work... weird... I guess theirs a stack overflow or something like that internally of this function. Ah, and it's definitly a php5-incompatibility as php5 passes my test case while hhvm does not.
matzekuh commented 9 years ago

The function seems to work fine during the first run of the foreach loop. During the second run the function fails.

lexidor commented 8 months ago

Closing as there has been no progress in 8 years. If you or anyone reading this can manage to create a default with a known input (and regex), please file a new issue.