github / cmark-gfm

GitHub's fork of cmark, a CommonMark parsing and rendering library and program in C
Other
897 stars 174 forks source link

Use after free for text literal after extensive AST manipulation #69

Open foonathan opened 6 years ago

foonathan commented 6 years ago

I've been unable to find a small example that reproduces the error, so I try to describe it.

I have a syntax extension that does some heavy AST manipulation in the post process function. In particular, sometimes it copies all children of a paragraph into a new paragraph and cmark_node_frees the old one. In some circumstances, this leads to a heap use after free.

As far as I can tell, the issue is with text nodes whose chunk have alloc set to 0. If I understand the parsing code correctly, their memory is then owned by the parent container and is only lazily copied over. But when I add them to a new parent and destroy the old one, their chunks refer to freed memory.

I'm trying to get a small example that reproduces the error but maybe you understand the issue.

kivikakk commented 6 years ago

Yeah, I think I understand the kind of issue you're talking about. The way chunks are manipulated in general is pretty awkward, and I think there's some not-so-clear assumptions in the original codebase, which while are consistently made in the core code, are easily accidentally violated. I know I've done so while making changes to our fork here occasionally — I think there's none left now (we do extensive 24/7 fuzzing), but it's really easy to do it oneself.

I'm not sure exactly what the solution is, but it may be to consider ownership and/or have copies of nodes own their data if necessary.

foonathan commented 6 years ago

Okay, I was able to work around the issue.