Introduce a source generator and end-to-end compile benchmarks

chandlerc commented 2 months ago

The big addition here is a very, very rough and very early skeleton of a source code generator framework. This builds upon the lexers identifier synthesis logic, improving on its framework and wiring it up with the most rudimentary of source file generation. This is just enough to roughly replicate my "big API file" source code benchmarks.

The source generation works very hard to both vary the structure and content of the source as much as possible while ensuring the same total amount of each construct is in use, from bytes in identifiers to line breaks, parameters, etc. This lets us generate randomly structure inputs that should consistently take the exact same amount of total work to compile.

The complex identifier synthesis logic from the lexer's benchmark is moved over here and the lexer uses APIs in the source generator for identifiers. The other source synthesis in the lexer's benchmark isn't yet moved over, but should likely be slowly absorbed here as it can be refactored into a more principled and re-usable form. Some bits may stay of course if they're just too lexer-specific.

Next, this adds a simple end-to-end compile benchmark for the driver that directly and much more clearly reproduces all the measurements I've done manually up until now. It should also be easy to extend to more patterns over time as we add support to the source generator to produce those patterns.

Last but not least, I've added a tiny CLI to the source generator so that you can generate source code manually. This is especially nice for generating demo source code to actually run through the driver or look at in an editor. The CLI can also generate C++ source code which lets us do some minimal comparative benchmarking between Carbon and C++/Clang.

There are huge number of TODOs in the source generation framework. This is going to be a large ongoing effort I suspect.

There are also a bunch of rough edges I've left to try and get this out for review sooner. I've left TODOs for refactorings that really need to be done here, but hoping these can maybe be follow-ups. If not, please flag and I'll try to layer them on here.

Sample compile benchmark output, nicely showing where we are w.r.t. our goal speeds (2x behind on lex and check, 5x on parse) at least on a recent AMD server CPU:

------------------------------------------------------------------------------------------------------
Benchmark                                                 Time             CPU   Iterations      Lines
------------------------------------------------------------------------------------------------------
BM_CompileAPIFileDenseDecls<Phase::Lex>/256           29420 ns        29419 ns        22860 6.62847M/s
BM_CompileAPIFileDenseDecls<Phase::Lex>/1024         146130 ns       146128 ns         4840 6.69959M/s
BM_CompileAPIFileDenseDecls<Phase::Lex>/4096         601584 ns       601577 ns         1020 6.69573M/s
BM_CompileAPIFileDenseDecls<Phase::Lex>/16384       2547578 ns      2547313 ns          280   6.404M/s
BM_CompileAPIFileDenseDecls<Phase::Lex>/65536      10816591 ns     10816389 ns           80 6.05193M/s
BM_CompileAPIFileDenseDecls<Phase::Lex>/262144     52191320 ns     52189828 ns           20 5.02261M/s
BM_CompileAPIFileDenseDecls<Phase::Parse>/256        101706 ns       101698 ns         6900 1.91745M/s
BM_CompileAPIFileDenseDecls<Phase::Parse>/1024       512161 ns       512162 ns         1380  1.9115M/s
BM_CompileAPIFileDenseDecls<Phase::Parse>/4096      2078426 ns      2078430 ns          340   1.938M/s
BM_CompileAPIFileDenseDecls<Phase::Parse>/16384     8795786 ns      8795583 ns          100 1.85468M/s
BM_CompileAPIFileDenseDecls<Phase::Parse>/65536    35073596 ns     35072973 ns           20 1.86639M/s
BM_CompileAPIFileDenseDecls<Phase::Parse>/262144  151100688 ns    151097370 ns           20 1.73483M/s
BM_CompileAPIFileDenseDecls<Phase::Check>/256        957059 ns       957049 ns          740 203.751k/s
BM_CompileAPIFileDenseDecls<Phase::Check>/1024      1956134 ns      1955985 ns          360 500.515k/s
BM_CompileAPIFileDenseDecls<Phase::Check>/4096      5797864 ns      5797417 ns          120 694.792k/s
BM_CompileAPIFileDenseDecls<Phase::Check>/16384    21219608 ns     21217584 ns           40 768.843k/s
BM_CompileAPIFileDenseDecls<Phase::Check>/65536    96311116 ns     96302334 ns           20 679.734k/s
BM_CompileAPIFileDenseDecls<Phase::Check>/262144  371637963 ns    371609964 ns           20 705.387k/s

Lest someone think this is bad, the fact that we're already within 2x of our rather audacious goals makes me quite happy. =D

chandlerc commented 2 months ago

What do you think about adding more comments to this code? I'm seeing a lot that's not commented, and I think it'd be easier to review if there were more explanation of what the entities are.

I just forgot to go back to that item in my own TODOs before sending this out for review, mostly in trying to get it out sooner. Sorry about that. I'll make a pass through.

chandlerc commented 2 months ago

Can you take some time to walk through the source_gen files and add comments? While I see public members in source_gen.h have comments, none of the private members do (including public members of the private UniqueIdPopper), and most static/constexpr entities in the cpp file don't. Not sure if you missed this because I didn't comment on the file specifically.

No, I think I just only looked at a few places rather than the whole file, sorry about that.

Some of the functions don't seem to need much in the way of comments, the function name and param names would just be repeated in a sentence I think... But a bunch definitely needed comments. There was a bunch of subtle stuff, etc... Anyways, everything that jumped out at me as needing a comment has one now I think, and I'm much more sure I actually went all the way through the file rather than just glancing at one area in the file and forgetting about the rest. If there are still things that would really benefit from a comment, it may be useful to understand a bit better what needs clarification at this point as I hopefully didn't miss anything for a third time.

I also spotted a few places where one comment you made applies more broadly and tried to update based on it, and cleaned up a few things that adding comments made me think better about.

chandlerc commented 1 month ago

Ok, all the renaming of ID stuff is I think done. At least, my regex-ing in source_gen* seems to be clean.

carbon-language / carbon-lang

Introduce a source generator and end-to-end compile benchmarks #4124