fletcher / peg-multimarkdown

An implementation of MultiMarkdown in C, using a PEG grammar - a fork of jgm's peg-markdown. No longer under active development - see MMD 5.
Other
525 stars 55 forks source link

segfault on very long lines? #121

Closed abartov closed 10 years ago

abartov commented 12 years ago

I'm encountering a segfault on an input file. I've tried several combinations, and it seems what triggers the segfault is the very long lines -- my input has hugely long paragraphs.

I'll attach a crashing input file. Here's a backtrace with a version compiled with debug info (crashes on release as well). I apologize the actual input is probably not readable to you. It is a UTF-8 Hebrew file. Other UTF-8 Hebrew files, with shorter paragraphs, parse and render without a hitch.

asaf@manutius:~/ko$ multimarkdown_dbg bisect_mmd.txt Segmentation fault (core dumped) asaf@manutius:~/ko$ gdb multimarkdown_dbg core GNU gdb (GDB) 7.4.1-debian Copyright (C) 2012 Free Software Foundation, Inc. License GPLv3+: GNU GPL version 3 or later http://gnu.org/licenses/gpl.html This is free software: you are free to change and redistribute it. There is NO WARRANTY, to the extent permitted by law. Type "show copying" and "show warranty" for details. This GDB was configured as "i486-linux-gnu". For bug reporting instructions, please see: http://www.gnu.org/software/gdb/bugs/... Reading symbols from /usr/local/bin/multimarkdown_dbg...done.

warning: core file may not match specified executable file. [New LWP 2018]

warning: Can't read pathname for load map: Input/output error. Core was generated by `multimarkdown_dbg bisect_mmd.txt'. Program terminated with signal 11, Segmentation fault.

0 0x08048c1b in reverse (list=0x9dd799d7) at utility_functions.c:29

29 next = list->next; (gdb) bt

0 0x08048c1b in reverse (list=0x9dd799d7) at utility_functions.c:29

1 0x08048e7c in mk_list (key=0, lst=0x9b4fc00) at utility_functions.c:126

2 0x0804c42f in yy_3_Str (yytext=0x997bee0 "", yyleng=0) at markdown_parser.c:1874

3 0x08049d8f in yyDone () at markdown_parser.c:211

4 0x08068e7b in yyparsefrom (yystart=0x80685d4 ) at markdown_parser.c:9189

5 0x08069156 in parse_markdown_with_metadata (

string=0xb74b9008 "#××ק×ר×× ×× ××קר×\r\n##נעתק ××ש×× ×¢×¨××ת ××ש×× ×¢×ר×ת ×××××ר\r\n###××ת\r\n##צ×× ×××××××\r\n×××× × "..., extensions=3, reference_list=0x0, note_list=0x997b480, label_list=0x9a03210) at parsing_functions.c:147

6 0x08072082 in markdown_to_g_string (

text=0xb753a008 "#××ק×ר×× ×× ××קר×\r\n##נעתק ××ש×× ×¢×¨××ת ××ש×× ×¢×ר×ת ×××××ר\r\n###××ת\r\n##צ×× ×××××××\r\n×××× × "..., extensions=3, output_format=0) at markdown_lib.c:165

7 0x08072127 in markdown_to_string (

text=0xb753a008 "#××ק×ר×× ×× ××קר×\r\n##נעתק ××ש×× ×¢×¨××ת ××ש×× ×¢×ר×ת ×××××ר\r\n###××ת\r\n##צ×× ×××××××\r\n×××× × "..., extensions=3, output_format=0) at markdown_lib.c:185

8 0x08073016 in main (argc=2, argv=0xbf912c54) at markdown.c:335

(gdb)

abartov commented 12 years ago

Hey, no attachments on GitHub. Alright, the offending file is at http://benyehuda.org/~asaf/ and I have also included a smaller file, bisect_long_paragraph.txt, which produces a glibc internal error before dumping core:

* glibc detected * multimarkdown_dbg: corrupted double-linked list: 0x08dae438 *** ======= Backtrace: ========= /lib/i386-linux-gnu/libc.so.6(+0x6a83a)[0xb773383a] /lib/i386-linux-gnu/libc.so.6(+0x6ac8a)[0xb7733c8a] /lib/i386-linux-gnu/libc.so.6(+0x6cdc5)[0xb7735dc5] /lib/i386-linux-gnu/libc.so.6(libc_malloc+0x5c)[0xb773826c] multimarkdown_dbg[0x8072347] multimarkdown_dbg[0x8069791] multimarkdown_dbg[0x806ab18] multimarkdown_dbg[0x806a23e] multimarkdown_dbg[0x8071074] multimarkdown_dbg[0x8071099] multimarkdown_dbg[0x806a21f] multimarkdown_dbg[0x80706ef] multimarkdown_dbg[0x80720e2] multimarkdown_dbg[0x8072127] multimarkdown_dbg[0x8073016] /lib/i386-linux-gnu/libc.so.6(libc_start_main+0xe6)[0xb76dfe16] multimarkdown_dbg[0x8048b31] ======= Memory map: ======== 08048000-08080000 r-xp 00000000 08:01 326952 /usr/local/bin/multimarkdown_dbg 08080000-08081000 rw-p 00037000 08:01 326952 /usr/local/bin/multimarkdown_dbg 08dae000-08e95000 rw-p 00000000 00:00 0 [heap] b7200000-b7221000 rw-p 00000000 00:00 0 b7221000-b7300000 ---p 00000000 00:00 0 b7304000-b7505000 rw-p 00000000 00:00 0 b76ab000-b76c7000 r-xp 00000000 08:01 267966 /lib/i386-linux-gnu/libgcc_s.so.1 b76c7000-b76c8000 rw-p 0001b000 08:01 267966 /lib/i386-linux-gnu/libgcc_s.so.1 b76c8000-b76c9000 rw-p 00000000 00:00 0 b76c9000-b7807000 r-xp 00000000 08:01 318448 /lib/i386-linux-gnu/libc-2.13.so b7807000-b7808000 ---p 0013e000 08:01 318448 /lib/i386-linux-gnu/libc-2.13.so b7808000-b780a000 r--p 0013e000 08:01 318448 /lib/i386-linux-gnu/libc-2.13.so b780a000-b780b000 rw-p 00140000 08:01 318448 /lib/i386-linux-gnu/libc-2.13.so b780b000-b780e000 rw-p 00000000 00:00 0 b7818000-b781a000 rw-p 00000000 00:00 0 b781a000-b781b000 r-xp 00000000 00:00 0 [vdso] b781b000-b7836000 r-xp 00000000 08:01 318450 /lib/i386-linux-gnu/ld-2.13.so b7836000-b7837000 r--p 0001b000 08:01 318450 /lib/i386-linux-gnu/ld-2.13.so b7837000-b7838000 rw-p 0001c000 08:01 318450 /lib/i386-linux-gnu/ld-2.13.so bfd65000-bfd86000 rw-p 00000000 00:00 0 [stack] Aborted (core dumped)

fletcher commented 12 years ago

Does this file parse when it is broken into shorter lines? Because I can't read it, it's harder for me to scan and look for obvious problems (like the alice in wonderland example).

As an aside, why are you creating such long lines? You realize that you can insert line breaks in Markdown/MultiMarkdown text and that the output remains as a single paragraph. Granted, if the file is still actually valid MMD (the alice example was not, I can't tell about this one) it shouldn't really matter, but it does make troubleshooting easier.

fletcher commented 10 years ago

This version of MMD is no longer under development (see MMD 4). That said, the valid (presumably) files you list parse just fine in MMD 4. The alice example doesn't, but that file is an absolute mess as far as MMD is concerned, so I'm not sure I consider that a failure.