Long recursive calls cause afl to segfault

MegaManSec commented 3 years ago

As discussed in #14 the following grammar causes a segfault from AFL (maybe only on startup?): https://paste.pr0.tips/rm

This is due to really long recursion:

#764 0x00007fffeed762f8 in antlr4::atn::ParserATNSimulator::closureCheckingStopState(std::shared_ptr<antlr4::atn::ATNConfig> const&, antlr4::atn::ATNConfigSet*, std::unordered_set<std::shared_ptr<antlr4::atn::ATNConfig>, antlr4::atn::ATNConfig::Hasher, antlr4::atn::ATNConfig::Comparer, std::allocator<std::shared_ptr<antlr4::atn::ATNConfig> > >&, bool, bool, int, bool) () from /home/jrogers/Grammar-Mutator/libgrammarmutator-http.so
#765 0x00007fffeed7a67a in antlr4::atn::ParserATNSimulator::closure_(std::shared_ptr<antlr4::atn::ATNConfig> const&, antlr4::atn::ATNConfigSet*, std::unordered_set<std::shared_ptr<antlr4::atn::ATNConfig>, antlr4::atn::ATNConfig::Hasher, antlr4::atn::ATNConfig::Comparer, std::allocator<std::shared_ptr<antlr4::atn::ATNConfig> > >&, bool, bool, int, bool) () from /home/jrogers/Grammar-Mutator/libgrammarmutator-http.so
#766 0x00007fffeed763a5 in antlr4::atn::ParserATNSimulator::closureCheckingStopState(std::shared_ptr<antlr4::atn::ATNConfig> const&, antlr4::atn::ATNConfigSet*, std::unordered_set<std::shared_ptr<antlr4::atn::ATNConfig>, antlr4::atn::ATNConfig::Hasher, antlr4::atn::ATNConfig::Comparer, std::allocator<std::shared_ptr<antlr4::atn::ATNConfig> > >&, bool, bool, int, bool) () from /home/jrogers/Grammar-Mutator/libgrammarmutator-http.so
#767 0x00007fffeed7a67a in antlr4::atn::ParserATNSimulator::closure_(std::shared_ptr<antlr4::atn::ATNConfig> const&, antlr4::atn::ATNConfigSet*, std::unordered_set<std::shared_ptr<antlr4::atn::ATNConfig>, antlr4::atn::ATNConfig::Hasher, antlr4::atn::ATNConfig::Comparer, std::allocator<std::shared_ptr<antlr4::atn::ATNConfig> > >&, bool, bool, int, bool) () from /home/jrogers/Grammar-Mutator/libgrammarmutator-http.so
#768 0x00007fffeed763a5 in antlr4::atn::ParserATNSimulator::closureCheckingStopState(std::shared_ptr<antlr4::atn::ATNConfig> const&, antlr4::atn::ATNConfigSet*, std::unordered_set<std::shared_ptr<antlr4::atn::ATNConfig>, antlr4::atn::ATNConfig::Hasher, antlr4::atn::ATNConfig::Comparer, std::allocator<std::shared_ptr<antlr4::atn::ATNConfig> > >&, bool, bool, int, bool) () from /home/jrogers/Grammar-Mutator/libgrammarmutator-http.so
#769 0x00007fffeed762f8 in antlr4::atn::ParserATNSimulator::closureCheckingStopState(std::shared_ptr<antlr4::atn::ATNConfig> const&, antlr4::atn::ATNConfigSet*, std::unordered_set<std::shared_ptr<antlr4::atn::ATNConfig>, antlr4::atn::ATNConfig::Hasher, antlr4::atn::ATNConfig::Comparer, std::allocator<std::shared_ptr<antlr4::atn::ATNConfig> > >&, bool, bool, int, bool) () from /home/jrogers/Grammar-Mutator/libgrammarmutator-http.so
#770 0x00007fffeed7a67a in antlr4::atn::ParserATNSimulator::closure_(std::shared_ptr<antlr4::atn::ATNConfig> const&, antlr4::atn::ATNConfigSet*, std::unordered_set<std::shared_ptr<antlr4::atn::ATNConfig>, antlr4::atn::ATNConfig::Hasher, antlr4::atn::ATNConfig::Comparer, std::allocator<std::shared_ptr<antlr4::atn::ATNConfig> > >&, bool, bool, int, bool) () from /home/jrogers/Grammar-Mutator/libgrammarmutator-http.so
#771 0x00007fffeed763a5 in antlr4::atn::ParserATNSimulator::closureCheckingStopState(std::shared_ptr<antlr4::atn::ATNConfig> const&, antlr4::atn::ATNConfigSet*, std::unordered_set<std::shared_ptr<antlr4::atn::ATNConfig>, antlr4::atn::ATNConfig::Hasher, antlr4::atn::ATNConfig::Comparer, std::allocator<std::shared_ptr<antlr4::atn::ATNConfig> > >&, bool, bool, int, bool) () from /home/jrogers/Grammar-Mutator/libgrammarmutator-http.so
#772 0x00007fffeed7a67a in antlr4::atn::ParserATNSimulator::closure_(std::shared_ptr<antlr4::atn::ATNConfig> const&, antlr4::atn::ATNConfigSet*, std::unordered_set<std::shared_ptr<antlr4::atn::ATNConfig>, antlr4::atn::ATNConfig::Hasher, antlr4::atn::ATNConfig::Comparer, std::allocator<std::shared_ptr<antlr4::atn::ATNConfig> > >&, bool, bool, int, bool) () from /home/jrogers/Grammar-Mutator/libgrammarmutator-http.so
#773 0x00007fffeed763a5 in antlr4::atn::ParserATNSimulator::closureCheckingStopState(std::shared_ptr<antlr4::atn::ATNConfig> const&, antlr4::atn::ATNConfigSet*, std::unordered_set<std::shared_ptr<antlr4::atn::ATNConfig>, antlr4::atn::ATNConfig::Hasher, antlr4::atn::ATNConfig::Comparer, std::allocator<std::shared_ptr<antlr4::atn::ATNConfig> > >&, bool, bool, int, bool) () from /home/jrogers/Grammar-Mutator/libgrammarmutator-http.so
#774 0x00007fffeed762f8 in antlr4::atn::ParserATNSimulator::closureCheckingStopState(std::shared_ptr<antlr4::atn::ATNConfig> const&, antlr4::atn::ATNConfigSet*, std::unordered_set<std::shared_ptr<antlr4::atn::ATNConfig>, antlr4::atn::ATNConfig::Hasher, antlr4::atn::ATNConfig::Comparer, std::allocator<std::shared_ptr<antlr4::atn::ATNConfig> > >&, bool, bool, int, bool) () from /home/jrogers/Grammar-Mutator/libgrammarmutator-http.so
#775 0x00007fffeed7a67a in antlr4::atn::ParserATNSimulator::closure_(std::shared_ptr<antlr4::atn::ATNConfig> const&, antlr4::atn::ATNConfigSet*, std::unordered_set<std::shared_ptr<antlr4::atn::ATNConfig>, antlr4::atn::ATNConfig::Hasher, antlr4::atn::ATNConfig::Comparer, std::allocator<std::shared_ptr<antlr4::atn::ATNConfig> > >&, bool, bool, int, bool) () from /home/jrogers/Grammar-Mutator/libgrammarmutator-http.so
#776 0x00007fffeed763a5 in antlr4::atn::ParserATNSimulator::closureCheckingStopState(std::shared_ptr<antlr4::atn::ATNConfig> const&, antlr4::atn::ATNConfigSet*, std::unordered_set<std::shared_ptr<antlr4::atn::ATNConfig>, antlr4::atn::ATNConfig::Hasher, antlr4::atn::ATNConfig::Comparer, std::allocator<std::shared_ptr<antlr4::atn::ATNConfig> > >&, bool, bool, int, bool) () from /home/jrogers/Grammar-Mutator/libgrammarmutator-http.so

huanggh666 commented 2 years ago

I meet a similar problem. Is there a solutions?

0x7Fancy commented 6 months ago

I'm stuck here too (this is so long time callback.XD

based on the problems in my own scenario and reviewing #14, #28, and #36, I think that the current recursive grammar provided by Grammar-Mutator easily increases the backtracking cost during syntax tree parsing during mutation, and eventually gets stuck in the grammar parsing process.

I make a minimal case to reproduce the above problem;

(because json can be regarded as a one-to-one correspondence with g4, for the convenience of description, I will use the g4 syntax of antlr4 for expression below.)

I make g4 grammar test.g4:

grammar test;
entry: stmt1 '\n' EOF
     ;
stmt1: 'I' SPACE stmt2 SPACE 'like' SPACE 'C++'
     ;
stmt2: NODE
     | NODE SPACE
     | NODE SPACE NODE
     | NODE SPACE NODE SPACE
     | NODE SPACE NODE SPACE NODE
     | NODE SPACE stmt2
     ;
NODE : 'very'
     | 'little'
     ;
SPACE: ' '
     ;

and I make input 1024_very.txt:

I very very ...(*1020)... very very like C++

use antlr4-parse to parse as follows: Screen Shot 2024-01-08 at 17 28 16

The above is a special case combination that can construct similar stack calls. In fact, we only need to write simple recursion normally, but reproducing the problem may require more very

after analysis, it can be found that when antlr4 analyzes the above grammar, each round of recursion will check prefix matching, and its consumption of performance increases exponentially with the increase of recursion depth; it may eventually cause three phenomena:

stack call explosion
100% cpu usage
high memory usage

0x7Fancy commented 6 months ago

going further, I found a way to mitigate;

based on the above issues, we create simpler test cases, test.json:

{
    "<entry>": [["I ", "<stmt1>", "like C++\n"]],
    "<stmt1>": [["<NODE>", "<stmt1>"], []],
    "<NODE>": [["very "]]
}

tanslate to test.g4:

grammar test;
entry: 'I ' stmt1 'like C++\n' EOF
     ;
stmt1: 
     | NODE stmt1
     ;
NODE : 'very '
     ;

and input 40960_very.txt:

I very very ...(*40956)... very very like C++

running with antlr4-parse: Screen Shot 2024-01-08 at 17 56 39

from the perspective of antlr4, we can use the + syntax to describe test.g4, and ignore this prefix matching, as follows test.g4:

grammar test;
entry: 'I ' stmt1 'like C++\n' EOF
     ;
stmt1: 
     | (NODE)+
     ;
NODE : 'very '
     ;

running again with antlr4-parse: Screen Shot 2024-01-08 at 17 59 40

so I made a patch to implement the above ideas, please refer to https://github.com/0x7Fancy/Grammar-Mutator/commit/6eae7d14f579b4d3d1196fc1a288c61044a1afe1;

I have only implemented the optimization of head recursion and tail recursion here, which is simple and easy to understand. for intermediate recursion, I think it can be rewritten as head/tail recursion in json

of course, this is just a mitigation measure. When the mutation generates a sufficiently complex syntax tree, it may still cause antlr4 to get stuck in syntax parsing.

0x7Fancy commented 6 months ago

@h1994st if you think it's ok, I can send PR

h1994st commented 6 months ago

@0x7Fancy Thanks for digging into this issue! Such an old issue lol

The fix looks good to me. Please go ahead and create a PR. I will merge it then.

MegaManSec commented 6 months ago

I don't have the ability to test at the moment, but is there any chance somebody could check whether the HTTP grammar I created works with the patch?

{
    "<A>": [["<START_LINE>", "\r\n", "<HEADERS>", "\r\n"]],

    "<START_LINE>": [["<METHOD>", " ", "<URI>", " ", "<VERSION>"]],

    "<METHOD>": [["NONE"], ["GET"], ["POST"], ["PUT"], ["HEAD"], ["CONNECT"], ["TRACE"], ["OPTIONS"], ["DELETE"], ["LINK"], ["UNLINK"], ["CHECKOUT"], ["CHECKIN"], ["UNCHECKOUT"], ["MKWORKSPACE"], ["VERSION"], ["REPORT"], ["UPDATE"], ["LABEL"], ["MERGE"], ["BASELINE"], ["MKACTIVITY"], ["ORDERPATCH"], ["ACL"], ["MKREDIRECTREF"], ["UPDATEREDIRECTREF"], ["MKCALENDAR"], ["PROPFIND"], ["PROPPATCH"], ["MKCOL"], ["COPY"], ["MOVE"], ["LOCK"], ["UNLOCK"], ["SEARCH"], ["PATCH"], ["BIND"], ["REBIND"], ["UNBIND"], ["PRI"], ["PURGE"], ["OTHER"], ["ENUM"]],

    "<URI>": [["<SCHEME>" , ":", "<HIER>", "<QUERY>", "<FRAGMENT>"]],

    "<SCHEME>": [["NONE"], ["cache_object"], ["HTTP"], ["FTP"], ["HTTPS"], ["COAP"], ["COAPS"], ["GOPHER"], ["WAIS"], ["CACHE"], ["ICP"], ["HTCP"], ["URN"], ["WHOIS"], ["ICY"], ["TLS"], ["SSL"], ["AUTHORITY"], ["UNKNOWN"], ["MAX"]],

    "<HIER>": [["//", "<AUTHORITY>", "<PATH>"]],

    "<AUTHORITY>": [["<USERINFO>", "<HOST>"]],

    "<PATH>": [["/", "<DIR>"]],

    "<DIR>": [[], ["<CHAR>", "/", "<DIR>"]],

    "<USERINFO>": [[], ["<CHAR>", ":", "<CHAR>", "@"]],

    "<HOST>": [["127.0.0.1:8080"]],

    "<QUERY>": [[], ["?", "<CHAR>" , "=", "<CHAR>"]],

    "<FRAGMENT>": [[], ["#", "<CHAR>"]],

    "<VERSION>": [["HTTP/0.9"], ["HTTP/1.0"], ["HTTP/1.1"], ["HTTP/2.0"], ["ICY"]],

    "<HEADERS>": [[], ["<HEADER>", "\r\n", "<HEADERS>"]],

    "<HEADER>": [["<HEADER_FIELD>", ": ", "<ANY>"]],

    "<HEADER_FIELD>": [["Accept"],["Accept-Charset"],["Accept-Encoding"],["Accept-Language"],["Accept-Ranges"],["Age"],["Allow"],["Alternate-Protocol"],["Authentication-Info"],["Authorization"],["Cache-Control"],["Cache-Status"],["CDN-Loop"],["Connection"],["Content-Base"],["Content-Disposition"],["Content-Encoding"],["Content-Language"],["Content-Length"],["Content-Location"],["Content-MD5"],["Content-Range"],["Content-Type"],["Cookie"],["Cookie2"],["Date"],["ETag"],["Expect"],["Expires"],["Forwarded"],["From"],["Host"],["HTTP2-Settings"],["If-Match"],["If-Modified-Since"],["If-None-Match"],["If-Range"],["If-Unmodified-Since"],["Keep-Alive"],["Key"],["Last-Modified"],["Link"],["Location"],["Max-Forwards"],["Mime-Version"],["Negotiate"],["Origin"],["Pragma"],["Proxy-Authenticate"],["Proxy-Authentication-Info"],["Proxy-Authorization"],["Proxy-Connection"],["Proxy-support"],["Public"],["Range"],["Referer"],["Request-Range"],["Retry-After"],["Server"],["Set-Cookie"],["Set-Cookie2"],["TE"],["Title"],["Trailer"],["Transfer-Encoding"],["Translate"],["Unless-Modified-Since"],["Upgrade"],["User-Agent"],["Vary"],["Via"],["Warning"],["WWW-Authenticate"],["X-Forwarded-For"],["X-Request-URI"],["X-Squid-Error"],["X-Accelerator-Vary"],["X-Next-Services"],["Surrogate-Capability"],["Surrogate-Control"],["Front-End-Https"],["FTP-Command"],["FTP-Arguments"],["FTP-Pre"],["FTP-Status"],["FTP-Reason"],["Other"]],

    "<BODY>": [[], ["<CHAR>"]],

    "<ANY>": [[], ["<DATE>"], ["<CHAR>"], ["<HOST>"], ["<URI>"], ["<SPECIAL>"]],

    "<SPECIAL>": [["identity"], ["gzip"], ["deflate"], ["compress"], ["chunked"], ["bytes"], ["<SPECIAL>", ",", "<SPECIAL>"], [",", "<SPECIAL>"], ["\"", "<SPECIAL>", "\""], ["<SPECIAL>", "=", "<SPECIAL>"]],

    "<DATE>": [["Sat, 29 Oct 1994 19:43:31 GMT"], ["Sat, 29 Oct 2021 19:43:31 GMT"]],

    "<CHAR>": [[], ["<TCHAR>", "<CHAR>"]],

    "<TCHAR>": [["0"], ["1"], ["2"], ["3"], ["4"], ["5"], ["6"], ["7"], ["8"], ["9"], ["a"], ["b"], ["c"], ["d"], ["e"], ["f"], ["g"], ["h"], ["i"], ["j"], ["k"], ["l"], ["m"], ["n"], ["o"], ["p"], ["q"], ["r"], ["s"], ["t"], ["u"], ["v"], ["w"], ["x"], ["y"], ["z"], ["A"], ["B"], ["C"], ["D"], ["E"], ["F"], ["G"], ["H"], ["I"], ["J"], ["K"], ["L"], ["M"], ["N"], ["O"], ["P"], ["Q"], ["R"], ["S"], ["T"], ["U"], ["V"], ["W"], ["X"], ["Y"], ["Z"], ["!"], ["\""], ["#"], ["$"], ["%"], ["&"], ["'"], ["("], [")"], ["*"], ["+"], [","], ["-"], ["."], ["/"], [":"], [";"], ["<"], ["="], [">"], ["?"], ["@"], ["["], ["\\"], ["]"]]
}

0x7Fancy commented 6 months ago

emmmmm, wait a moment, this patch seems to still have problems, I will continue to follow up

0x7Fancy commented 6 months ago

@MegaManSec hi, I reproduced your original problem("causes a segfault from AFL (maybe only on startup)")

I don’t know much about LL(*) grammar (antlr4). Based on my understanding of LL(1) grammar, I think there are some problems in your grammar file, causing antlr4 to fall into the backtracking parsing process.

based on the json grammar you provided, I found the minimal test case (g4):

grammar test;
entry
    : node_DIR 'a' node_CHAR '=' '\n' EOF
    ;
node_DIR
    :
    | node_CHAR '/' node_DIR
    ;
node_CHAR
    : 
    | node_TCHAR node_CHAR
    ;
node_TCHAR
    : '='
    ;

and the input data poc is:

///=

running with antlr4-parse: Screen Shot 2024-01-16 at 18 30 17

in the test case, node_DIR and node_CHAR both can be ε，this is not allowed for LL(1) grammars, which may lead to infinite backtracking parsing, optimize it:

grammar test;
entry
    : node_DIR 'a' node_CHAR '=' '\n' EOF
    ;
node_DIR
    : node_CHAR
    | '/' node_DIR
    ;
node_CHAR
    : 
    | node_TCHAR node_CHAR
    ;
node_TCHAR
    : '='
    ;

it works!!!

ok, optimize json syntax like this:

-       "<DIR>": [[], ["<CHAR>", "/", "<DIR>"]],
+       "<DIR>": [["<CHAR>"], ["/", "<DIR>"]],

-       "<USERINFO>": [[], ["<CHAR>", ":", "<CHAR>", "@"]],
+       "<USERINFO>": [[], ["ui", "<CHAR>", ":", "<CHAR>", "@"]],

-       "<BODY>": [[], ["<CHAR>"]],
+       "<BODY>": [["body"], ["<CHAR>"]],

-       "<ANY>": [[], ["<DATE>"], ["<CHAR>"], ["<HOST>"], ["<URI>"], ["<SPECIAL>"]],
+       "<ANY>": [["any"], ["<DATE>"], ["<CHAR>"], ["<HOST>"], ["<URI>"], ["<SPECIAL>"]],

it works!!!

PS: please using https://github.com/AFLplusplus/Grammar-Mutator/commit/ff4e5a265daf5d88c4a636fb6a2c22b1d733db09, above patch maybe a mistake, I will continue to follow up

AFLplusplus / Grammar-Mutator

Long recursive calls cause afl to segfault #17