c4-project / c4f

The C4 Concurrent C Fuzzer
MIT License
14 stars 1 forks source link

Implement the 'lexer hack' for C #52

Closed MattWindsor91 closed 4 years ago

MattWindsor91 commented 5 years ago

The C parser/lexer, being based heavily on the classic K&R C89 grammar, make a syntactic distinction between typedef names and identifiers. This means that we need to feed the lexer the names of all typedef'd types, which means the parser needs to propagate them backwards---the classic C 'lexer hack'.

The Lib.Frontend interface doesn't assume any flow of information backwards through the pipeline, so it might need generalising. The easiest thing to do here might be to feed the tokeniser and parser some mutable tables, but this seems somewhat ugly. There might be a way to do it with Menhir's inspection API too.

MattWindsor91 commented 5 years ago

I'm putting this on the back-burner for now: it's much more involved than I thought it would be, it's not a massive priority for what we want to do with C, and it's not clear how to solve it without accidentally re-implementing, say, CompCERT.

MattWindsor91 commented 4 years ago

Gonna close this because, in a separate issue, I’m going to advocate leaving the Litmus support as a not-quite-C language (without typedefs) and instead delegate proper C and C++ parsing to external libraries à la #69.