Closed chandlerc closed 1 month ago
What do you think about adding more comments to this code? I'm seeing a lot that's not commented, and I think it'd be easier to review if there were more explanation of what the entities are.
I just forgot to go back to that item in my own TODOs before sending this out for review, mostly in trying to get it out sooner. Sorry about that. I'll make a pass through.
Can you take some time to walk through the source_gen files and add comments? While I see public members in source_gen.h have comments, none of the private members do (including public members of the private UniqueIdPopper), and most static/constexpr entities in the cpp file don't. Not sure if you missed this because I didn't comment on the file specifically.
No, I think I just only looked at a few places rather than the whole file, sorry about that.
Some of the functions don't seem to need much in the way of comments, the function name and param names would just be repeated in a sentence I think... But a bunch definitely needed comments. There was a bunch of subtle stuff, etc... Anyways, everything that jumped out at me as needing a comment has one now I think, and I'm much more sure I actually went all the way through the file rather than just glancing at one area in the file and forgetting about the rest. If there are still things that would really benefit from a comment, it may be useful to understand a bit better what needs clarification at this point as I hopefully didn't miss anything for a third time.
I also spotted a few places where one comment you made applies more broadly and tried to update based on it, and cleaned up a few things that adding comments made me think better about.
Ok, all the renaming of ID
stuff is I think done. At least, my regex-ing in source_gen*
seems to be clean.
The big addition here is a very, very rough and very early skeleton of a source code generator framework. This builds upon the lexers identifier synthesis logic, improving on its framework and wiring it up with the most rudimentary of source file generation. This is just enough to roughly replicate my "big API file" source code benchmarks.
The source generation works very hard to both vary the structure and content of the source as much as possible while ensuring the same total amount of each construct is in use, from bytes in identifiers to line breaks, parameters, etc. This lets us generate randomly structure inputs that should consistently take the exact same amount of total work to compile.
The complex identifier synthesis logic from the lexer's benchmark is moved over here and the lexer uses APIs in the source generator for identifiers. The other source synthesis in the lexer's benchmark isn't yet moved over, but should likely be slowly absorbed here as it can be refactored into a more principled and re-usable form. Some bits may stay of course if they're just too lexer-specific.
Next, this adds a simple end-to-end compile benchmark for the driver that directly and much more clearly reproduces all the measurements I've done manually up until now. It should also be easy to extend to more patterns over time as we add support to the source generator to produce those patterns.
Last but not least, I've added a tiny CLI to the source generator so that you can generate source code manually. This is especially nice for generating demo source code to actually run through the driver or look at in an editor. The CLI can also generate C++ source code which lets us do some minimal comparative benchmarking between Carbon and C++/Clang.
There are huge number of TODOs in the source generation framework. This is going to be a large ongoing effort I suspect.
There are also a bunch of rough edges I've left to try and get this out for review sooner. I've left TODOs for refactorings that really need to be done here, but hoping these can maybe be follow-ups. If not, please flag and I'll try to layer them on here.
Sample compile benchmark output, nicely showing where we are w.r.t. our goal speeds (2x behind on lex and check, 5x on parse) at least on a recent AMD server CPU:
Lest someone think this is bad, the fact that we're already within 2x of our rather audacious goals makes me quite happy. =D