commonmark / cmark

CommonMark parsing and rendering library and program in C
Other
1.6k stars 527 forks source link

Severe performance regression #523

Closed mity closed 5 months ago

mity commented 5 months ago

For some very rough benchmarking purposes I've just tried cmark and md2html from MD4C project to process the benchinput.md as produced by cmark's make bench.

On my windows machine, cmark (current head) seems to take horrendous 40+ secs, while MD4C is below 1 sec still with a safe margin. cmark used to be about 2 times slower than MD4C so the current factor of 60 is quite significant.

/src/cmark @master $ time -p /prj/md4c/build32-release/md2html/md2html.exe ./bench/benchinput.md >/dev/null
real 0.73
user 0.00
sys 0.01
/src/cmark @master $ time -p cmark ./bench/benchinput.md >/dev/null
real 44.02
user 0.01
sys 0.00
/src/cmark @master $

P.S. Don't mind the zeroes for sys/user. Likely because it's the MSYS2 environment on Windows.

P.P.S. I haven't used cmark for year or two so the regression may be there for quite some time already.

mity commented 5 months ago

Sorry for the noise, it's caused by something in the MSYS2 environment. When run outside of it, such a slowdown doesn't occur.

mity commented 5 months ago

(Reopening due the following findings. But feel free to close if you see it as irrelevant.)

I was interested in what's behind the heavy slowdown under MSYS2. Turns out there's exactly one single culprit, namely calling printf() on this line:

https://github.com/commonmark/cmark/blob/76cbc2d9162c1154327a746b8ad0d3f1a72b1b30/src/main.c#L72

What's even crazier, it seems that under hood the printf() (from C:\WINDOWS\SysWOW64\msvcrt.dll which is default target runtime lib for mingw64 toolchain) internally calls putc() for every single character. Depending on details how the other side of the pipe (shell) reads from the stdout, it may also explain the difference between bash in MSYS2 and CMD.COM.

Replacing the printf() with fwrite() makes the 40+ secs down to ~1.4 sec.

jgm commented 5 months ago

Interesting. This seems a perfectly good reason to use fwrite here.