CESNET / libyang

YANG data modeling language library
BSD 3-Clause "New" or "Revised" License
364 stars 291 forks source link

buffer overflow in the XML parser #142

Closed jktjkt closed 7 years ago

jktjkt commented 7 years ago

TL;DR: calling something like memcmp(input, "<![CDATA[", 9) fails if input contains less than 9 characters.

lyxml_parse_elem calls memcmp in its process loop in a way which makes it easy to read behind the end of the user input. Here's what ASAN says:

==3959==ERROR: AddressSanitizer: global-buffer-overflow on address 0x000000526daa at pc 0x000000483b3c bp 0x7ffc4f74ba10 sp 0x7ffc4f74b1c0
READ of size 9 at 0x000000526daa thread T0
    #0 0x483b3b in __interceptor_memcmp /var/tmp/portage/sys-devel/llvm-3.9.0/work/llvm-3.9.0.src/projects/compiler-rt/lib/asan/../sanitizer_common/sanitizer_common_interceptors.inc:628
    #1 0x7f6eb9345b5e in lyxml_parse_elem /home/jkt/work/prog/libyang/build/../src/xml.c:1015:25
    #2 0x7f6eb934b76b in lyxml_parse_mem /home/jkt/work/prog/libyang/build/../src/xml.c:1172:12
    #3 0x511090 in test_lyxml_get_attr /home/jkt/work/prog/libyang/build/../tests/api/test_xml.c:406:11
    #4 0x7f6eb97a6d96 in cmocka_run_one_test_or_fixture /var/tmp/portage/dev-util/cmocka-1.0.1/work/cmocka-1.0.1/src/cmocka.c:2305
    #5 0x7f6eb97a74e7 in cmocka_run_one_tests /var/tmp/portage/dev-util/cmocka-1.0.1/work/cmocka-1.0.1/src/cmocka.c:2413
    #6 0x7f6eb97a74e7 in _cmocka_run_group_tests /var/tmp/portage/dev-util/cmocka-1.0.1/work/cmocka-1.0.1/src/cmocka.c:2518
    #7 0x50fe74 in main /home/jkt/work/prog/libyang/build/../tests/api/test_xml.c:454:12
    #8 0x7f6eb8168733 in __libc_start_main /var/tmp/portage/sys-libs/glibc-2.22-r4/work/glibc-2.22/csu/libc-start.c:289
    #9 0x419bc8 in _start (/home/jkt/work/prog/libyang/build/tests/test_xml+0x419bc8)

0x000000526daa is located 54 bytes to the left of global variable '<string literal>' defined in '../tests/api/test_xml.c:48:23' (0x526de0) of size 41
  '<string literal>' is ascii string '<x xmlns="urn:a"><bubba>test</bubba></x>'
0x000000526daa is located 0 bytes to the right of global variable '<string literal>' defined in '../tests/api/test_xml.c:43:31' (0x526d60) of size 74
  '<string literal>' is ascii string '<x xmlns="urn:a" bubba="test">
  <bubba xmlns="urn:a" name="test"/>
</x>

I tried to read through the project history for a rationale for coming up with a hand-crafted XML parser. I understand that libxml2 was rejected due to the worries about its resource usage. I wonder if you gave consideration to other existing XML parsers, especially those which are streaming in nature? Expat, for example?

rkrejci commented 7 years ago

Yes, we were considering usage of a third-party XML parser. However, because YANG modeled XML data (and YIN format of the schema) has several limitation from the XML point of view (no mixed content, no PI, no CDATA sections), we have decided to write internal parser that can be tighten to the YANG data needs. The data structures can be smaller than in most XML parsers and as a bonus we avoid a dependency on external library.