antlr / antlr4

ANTLR (ANother Tool for Language Recognition) is a powerful parser generator for reading, processing, executing, or translating structured text or binary files.
http://antlr.org
BSD 3-Clause "New" or "Revised" License
17k stars 3.26k forks source link

Indirection in Rule Causes Container Overflow (ASAN, C++ Target) #2332

Open sanssecours opened 6 years ago

sanssecours commented 6 years ago

Problem Description

Currently I work on a project to create a ANTLR grammar for a small subset of YAML. For this purpose I created a custom lexer in C++, which feeds tokens to a parser generated by ANTLR. Since, I want to keep the code relatively bug free I enabled the AddressSanitizer provided by Clang. Until recently the address sanitizer only reported issues caused by my code. However, the last minor update in the grammar:

--- a/Grammar/YAML.g4
+++ b/Grammar/YAML.g4
@@ -8,6 +8,8 @@ yaml : STREAM_START child? STREAM_END EOF ;
 child : scalar | map;

 map : MAPPING_START pair BLOCK_END ;
-pair : KEY scalar VALUE scalar ;
+pair : KEY key VALUE value ;
+key : scalar ;
+value : scalar ;

 scalar : PLAIN_SCALAR;

seems to cause a container overflow in the code produced by ANTLR.

Unfortunately my project already requires some dependencies. Hopefully the bug report below is still useful, even though the code to show the problem could possibly be reduced quite considerably.

Steps to Reproduce

  1. Install dependencies for code (macOS)

    brew install antlr antlr4-cpp-runtime cmake elektra llvm ninja spdlog
  2. Check out the code

    git clone --branch overflow https://github.com/sanssecours/Yan-LR.git
  3. Generate build system and build executable

    cd Yan-LR
    make configure compile
  4. Run the executable using the input causing problems

    Build/badger 'Input/Disabled/Single Simple Mapping.yaml'

Expected Result

The executable prints

. The address sanitizer should not report any problems.

Actual Result

The executable prints

.

=================================================================
==13988==ERROR: AddressSanitizer: container-overflow on address 0x60c000000a48 at pc 0x000101e9341b bp 0x7ffeee641d80 sp 0x7ffeee641520
READ of size 128 at 0x60c000000a48 thread T0
    #0 0x101e9341a in wrap_memcpy (libclang_rt.asan_osx_dynamic.dylib:x86_64h+0x1c41a)
    #1 0x101d6d0a7 in std::__1::vector<antlr4::Token*, std::__1::allocator<antlr4::Token*> >::__swap_out_circular_buffer(std::__1::__split_buffer<antlr4::Token*, std::__1::allocator<antlr4::Token*>&>&) (libantlr4-runtime.4.7.1.dylib:x86_64+0x70a7)
    #2 0x101d78309 in void std::__1::vector<antlr4::tree::ParseTree*, std::__1::allocator<antlr4::tree::ParseTree*> >::__push_back_slow_path<antlr4::tree::ParseTree*>(antlr4::tree::ParseTree*&&) (libantlr4-runtime.4.7.1.dylib:x86_64+0x12309)
    #3 0x101d7767b in antlr4::tree::TerminalNodeImpl* antlr4::tree::ParseTreeTracker::createInstance<antlr4::tree::TerminalNodeImpl, antlr4::Token*&>(antlr4::Token*&&&) (libantlr4-runtime.4.7.1.dylib:x86_64+0x1167b)
    #4 0x101d765f3 in antlr4::Parser::consume() (libantlr4-runtime.4.7.1.dylib:x86_64+0x105f3)
    #5 0x101d7586a in antlr4::Parser::match(unsigned long) (libantlr4-runtime.4.7.1.dylib:x86_64+0xf86a)
    #6 0x1015c3cb4 in antlr::YAML::yaml() (badger:x86_64+0x100007cb4)
    #7 0x10169dce1 in main (badger:x86_64+0x1000e1ce1)
    #8 0x7fff70a02014 in start (libdyld.dylib:x86_64+0x1014)

0x60c000000a80 is located 0 bytes to the right of 128-byte region [0x60c000000a00,0x60c000000a80)
allocated by thread T0 here:
    #0 0x101ed9ca2 in wrap__Znwm (libclang_rt.asan_osx_dynamic.dylib:x86_64h+0x62ca2)
    #1 0x101605a35 in std::__1::__split_buffer<antlr4::tree::ParseTree*, std::__1::allocator<antlr4::tree::ParseTree*>&>::__split_buffer(unsigned long, unsigned long, std::__1::allocator<antlr4::tree::ParseTree*>&) (badger:x86_64+0x100049a35)
    #2 0x1015ff988 in std::__1::__split_buffer<antlr4::tree::ParseTree*, std::__1::allocator<antlr4::tree::ParseTree*>&>::__split_buffer(unsigned long, unsigned long, std::__1::allocator<antlr4::tree::ParseTree*>&) (badger:x86_64+0x100043988)
    #3 0x1015fd225 in void std::__1::vector<antlr4::tree::ParseTree*, std::__1::allocator<antlr4::tree::ParseTree*> >::__push_back_slow_path<antlr4::tree::ParseTree*>(antlr4::tree::ParseTree*&&) (badger:x86_64+0x100041225)
    #4 0x1015e17e2 in antlr::YAML::ScalarContext* antlr4::tree::ParseTreeTracker::createInstance<antlr::YAML::ScalarContext, antlr4::ParserRuleContext*&, unsigned long>(antlr4::ParserRuleContext*&&&, unsigned long&&) (badger:x86_64+0x1000257e2)
    #5 0x1015cc4a6 in antlr::YAML::scalar() (badger:x86_64+0x1000104a6)
    #6 0x1015d9c8b in antlr::YAML::key() (badger:x86_64+0x10001dc8b)
    #7 0x1015d3659 in antlr::YAML::pair() (badger:x86_64+0x100017659)
    #8 0x1015ce499 in antlr::YAML::map() (badger:x86_64+0x100012499)
    #9 0x1015c694d in antlr::YAML::child() (badger:x86_64+0x10000a94d)
    #10 0x1015c2d3a in antlr::YAML::yaml() (badger:x86_64+0x100006d3a)
    #11 0x10169dce1 in main (badger:x86_64+0x1000e1ce1)
    #12 0x7fff70a02014 in start (libdyld.dylib:x86_64+0x1014)

HINT: if you don't care about these errors you may set ASAN_OPTIONS=detect_container_overflow=0.
If you suspect a false positive see also: https://github.com/google/sanitizers/wiki/AddressSanitizerContainerOverflow.
SUMMARY: AddressSanitizer: container-overflow (libclang_rt.asan_osx_dynamic.dylib:x86_64h+0x1c41a) in wrap_memcpy
Shadow bytes around the buggy address:
  0x1c18000000f0: fa fa fa fa fa fa fa fa 00 00 00 00 00 00 00 00
  0x1c1800000100: 00 00 00 00 00 00 00 00 fa fa fa fa fa fa fa fa
  0x1c1800000110: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd
  0x1c1800000120: fa fa fa fa fa fa fa fa 00 00 00 00 00 00 00 00
  0x1c1800000130: 00 00 00 00 00 00 00 fa fa fa fa fa fa fa fa fa
=>0x1c1800000140: 00 00 00 00 00 00 00 00 00[fc]fc 00 00 fc fc fc
  0x1c1800000150: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
  0x1c1800000160: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
  0x1c1800000170: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
  0x1c1800000180: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
  0x1c1800000190: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
Shadow byte legend (one shadow byte represents 8 application bytes):
  Addressable:           00
  Partially addressable: 01 02 03 04 05 06 07
  Heap left redzone:       fa
  Freed heap region:       fd
  Stack left redzone:      f1
  Stack mid redzone:       f2
  Stack right redzone:     f3
  Stack after return:      f5
  Stack use after scope:   f8
  Global redzone:          f9
  Global init order:       f6
  Poisoned by user:        f7
  Container overflow:      fc
  Array cookie:            ac
  Intra object redzone:    bb
  ASan internal:           fe
  Left alloca redzone:     ca
  Right alloca redzone:    cb
==13988==ABORTING

Additional Information

The commands below show that the grammar update caused the container overflow:

git checkout 6726bd25
make compile
Build/badger 'Input/Disabled/Single Simple Mapping.yaml'

, since the last command does not print any errors reported by the address sanitizer.


Before submitting an issue to ANTLR, please check off these boxes:

Please include information about the expected behavior, actual behavior, and the smallest grammar or code that reproduces the behavior. If appropriate, please indicate the code generation targets such as Java, C#, ... Pointers into offending code regions are also very welcome.

gavrilikhin-d commented 1 year ago

Also have an issue with container overflow. Why is this still opened?

=================================================================
==18848==ERROR: AddressSanitizer: container-overflow on address 0x000105e04a08 at pc 0x00010456ccc4 bp 0x00016d316330 sp 0x00016d315ae8
READ of size 32 at 0x000105e04a08 thread T0
    #0 0x10456ccc0 in __asan_memcpy+0x1a4 (libclang_rt.asan_osx_dynamic.dylib:arm64e+0x3ccc0)
    #1 0x102dd45e4 in void std::__1::__construct_backward_with_exception_guarantees<std::__1::allocator<antlr4::tree::ParseTree*>, antlr4::tree::ParseTree*, void>(std::__1::allocator<antlr4::tree::ParseTree*>&, antlr4::tree::ParseTree**, antlr4::tree::ParseTree**, antlr4::tree::ParseTree**&) memory:799
    #2 0x102dd39b4 in std::__1::vector<antlr4::tree::ParseTree*, std::__1::allocator<antlr4::tree::ParseTree*> >::__swap_out_circular_buffer(std::__1::__split_buffer<antlr4::tree::ParseTree*, std::__1::allocator<antlr4::tree::ParseTree*>&>&) vector:976
    #3 0x103c13b78 in void std::__1::vector<antlr4::tree::ParseTree*, std::__1::allocator<antlr4::tree::ParseTree*> >::__push_back_slow_path<antlr4::tree::ParseTree*>(antlr4::tree::ParseTree*&&) vector:1650
    #4 0x103c138fc in std::__1::vector<antlr4::tree::ParseTree*, std::__1::allocator<antlr4::tree::ParseTree*> >::push_back(antlr4::tree::ParseTree*&&) vector:1678
    #5 0x103c08460 in antlr4::tree::TerminalNodeImpl* antlr4::tree::ParseTreeTracker::createInstance<antlr4::tree::TerminalNodeImpl, antlr4::Token*&>(antlr4::Token*&) ParseTree.h:95
    #6 0x103c06864 in antlr4::Parser::createTerminalNode(antlr4::Token*) Parser.cpp:652
    #7 0x103c06760 in antlr4::Parser::consume() Parser.cpp:337
    #8 0x102dc1164 in ppl::syntax::PPLParser::statement() PPLParser.cpp:226
    #9 0x102aef3ec in main test.cpp:162
    #10 0x102c79088 in start+0x204 (dyld:arm64e+0x5088)

0x000105e04a10 is located 0 bytes to the right of 32-byte region [0x000105e049f0,0x000105e04a10)
allocated by thread T0 here:
    #0 0x10457dea0 in wrap__Znwm+0x74 (libclang_rt.asan_osx_dynamic.dylib:arm64e+0x4dea0)
    #1 0x102dc9b14 in void* std::__1::__libcpp_operator_new<unsigned long>(unsigned long) new:235
    #2 0x102dc9a00 in std::__1::__libcpp_allocate(unsigned long, unsigned long) new:261
    #3 0x102dd4450 in std::__1::allocator<antlr4::tree::ParseTree*>::allocate(unsigned long) allocator.h:108
    #4 0x102dd4248 in std::__1::allocator_traits<std::__1::allocator<antlr4::tree::ParseTree*> >::allocate(std::__1::allocator<antlr4::tree::ParseTree*>&, unsigned long) allocator_traits.h:262
    #5 0x102dd3ff4 in std::__1::__split_buffer<antlr4::tree::ParseTree*, std::__1::allocator<antlr4::tree::ParseTree*>&>::__split_buffer(unsigned long, unsigned long, std::__1::allocator<antlr4::tree::ParseTree*>&) __split_buffer:315
    #6 0x102dd38f0 in std::__1::__split_buffer<antlr4::tree::ParseTree*, std::__1::allocator<antlr4::tree::ParseTree*>&>::__split_buffer(unsigned long, unsigned long, std::__1::allocator<antlr4::tree::ParseTree*>&) __split_buffer:314
    #7 0x102dd2c5c in void std::__1::vector<antlr4::tree::ParseTree*, std::__1::allocator<antlr4::tree::ParseTree*> >::__push_back_slow_path<antlr4::tree::ParseTree*>(antlr4::tree::ParseTree*&&) vector:1646
    #8 0x102dd2740 in std::__1::vector<antlr4::tree::ParseTree*, std::__1::allocator<antlr4::tree::ParseTree*> >::push_back(antlr4::tree::ParseTree*&&) vector:1678
    #9 0x102dc3f90 in ppl::syntax::PPLParser::AtomContext* antlr4::tree::ParseTreeTracker::createInstance<ppl::syntax::PPLParser::AtomContext, antlr4::ParserRuleContext*&, unsigned long>(antlr4::ParserRuleContext*&, unsigned long&&) ParseTree.h:95
    #10 0x102dc30ac in ppl::syntax::PPLParser::atom() PPLParser.cpp:310
    #11 0x102dc20c8 in ppl::syntax::PPLParser::expression() PPLParser.cpp:269
    #12 0x102dc0c90 in ppl::syntax::PPLParser::statement() PPLParser.cpp:216
    #13 0x102aef3ec in main test.cpp:162
    #14 0x102c79088 in start+0x204 (dyld:arm64e+0x5088)

HINT: if you don't care about these errors you may set ASAN_OPTIONS=detect_container_overflow=0.
If you suspect a false positive see also: https://github.com/google/sanitizers/wiki/AddressSanitizerContainerOverflow.
SUMMARY: AddressSanitizer: container-overflow (libclang_rt.asan_osx_dynamic.dylib:arm64e+0x3ccc0) in __asan_memcpy+0x1a4
Shadow bytes around the buggy address:
  0x007020be08f0: fd fd fd fa fa fa fd fd fd fa fa fa 00 00 00 00
  0x007020be0900: fa fa 00 00 00 00 fa fa 00 00 00 fa fa fa fd fd
  0x007020be0910: fd fa fa fa fd fd fd fa fa fa fd fd fd fa fa fa
  0x007020be0920: fd fd fd fd fa fa fd fd fd fa fa fa fd fd fd fa
  0x007020be0930: fa fa fd fd fd fd fa fa 00 00 00 00 fa fa 00 00
=>0x007020be0940: 00[fc]fa fa 00 00 00 00 fa fa 00 00 00 00 fa fa
  0x007020be0950: 00 00 00 fa fa fa fd fd fd fa fa fa fd fd fd fa
  0x007020be0960: fa fa fd fd fd fa fa fa fd fd fd fd fa fa fd fd
  0x007020be0970: fd fa fa fa fd fd fd fa fa fa 00 00 00 fa fa fa
  0x007020be0980: fd fd fd fa fa fa fd fd fd fa fa fa fd fd fd fa
  0x007020be0990: fa fa fd fd fd fa fa fa fd fd fd fa fa fa fd fd
Shadow byte legend (one shadow byte represents 8 application bytes):
  Addressable:           00
  Partially addressable: 01 02 03 04 05 06 07 
  Heap left redzone:       fa
  Freed heap region:       fd
  Stack left redzone:      f1
  Stack mid redzone:       f2
  Stack right redzone:     f3
  Stack after return:      f5
  Stack use after scope:   f8
  Global redzone:          f9
  Global init order:       f6
  Poisoned by user:        f7
  Container overflow:      fc
  Array cookie:            ac
  Intra object redzone:    bb
  ASan internal:           fe
  Left alloca redzone:     ca
  Right alloca redzone:    cb
==18848==ABORTING