commonmark / cmark

CommonMark parsing and rendering library and program in C
Other
1.6k stars 535 forks source link

UTF8 on windows can lead to Segmentation fault #44

Closed jeroen closed 9 years ago

jeroen commented 9 years ago

My R bindings are causing crashes on windows for certain files containing non-ascii utf8 characters when rendering to xml. It is very difficult to narrow this down because it happens quite randomly and only on windows. R uses mingw-w64 with gcc-4.6.3

One input file that consistently causes the crash is:

https://raw.githubusercontent.com/yihui/knitr/1df6eee4ac9387a881db60316c9b334fe21d5133/NEWS.md

The strange thing is that there is no particular line that causes the problem. Here it chokes on line 888 but modifying a random line elsewhere in the document can sometimes also fix the problem. Moreover I noticed that enabling CMARK_OPT_NORMALIZE will prevent the problem from appearing as well, at least for this particular file.

jgm commented 9 years ago

It would be most helpful if you could reproduce the problem using just cmark (or libcmark). Then we'd know the problem is with cmark and not with your interface code. I couldn't reproduce any problems converting this file with cmark.

You say the problem is just in rendering to xml, not to other formats?

What kind of "crashes"?

Are you sure it's non-ascii characters that cause the problem? I didn't see any in line 888.

jeroen commented 9 years ago

As said I had great difficulty diagnosing the exact problem, it is very unpredictable and appears only on windows. It freezes during cmark_render_xml (other formats work fine). I'll see if I can build the cmark CLI on windows to have a smaller example. My wrapper is very thin, it just passes along a cont char* not too much can go wrong there.

jeroen commented 9 years ago

I was able to reproduce the problem using a windows build for cmark.exe. I put a copy of my build and the NEWS.md file here: http://www.stat.ucla.edu/~jeroen/files/cmark-win32.zip.

To reproduce the problem, open a windows shell and run:

cmark -t xml NEWS.md

Other formats work fine:

cmark -t html NEWS.md
cmark -t man NEWS.md
jeroen commented 9 years ago

I tried building cmark it with a more recent vesion of gcc on windows and I keep getting segmentation faults when running this example.

nwellnhof commented 9 years ago

This can be reproduced by running

cmark -t xml NEWS.md

on Windows. I tracked it down to the call to vsnprintf in cmark_strbuf_vprintf. The Windows version of snprintf is not POSIX/C99-compliant as it returns -1 if the output was truncated.

jeroen commented 9 years ago

Here is some strace output: http://pastebin.com/raw.php?i=D6AgsCCd And here is output from drmemory: http://pastebin.com/raw.php?i=26N2Q7xE