Rewrite primitives library and related code generation

chipkent commented 3 years ago

The primitives library is a mess. We need better code generation, and the library needs to be rewritten with it.

devinrsmith commented 3 years ago

We may want to consider https://github.com/square/javapoet if we think the need to generate will be more than a one-off.

kosak commented 3 years ago

I did some work on code generation for Deephaven formulas and I wrote a library you might want to look at.

The focuses of my library are

substitution
composition
support for optional and repeated code blocks
Source code readability (on the generator side)
Getting indentation right

Formula generation gets pretty hairy and the code we had before was pretty hard to maintain. But here's an excerpt from the current code that has to do something kind of tricky.

    @NotNull
    private CodeGenerator generateNormalContextClass() {
        final CodeGenerator g = CodeGenerator.create(
                "private class FormulaFillContext implements [[FILL_CONTEXT_CANONICAL]]", CodeGenerator.block(
                        // The optional i chunk
                        CodeGenerator.optional("needsIChunk", "private final WritableIntChunk<Attributes.OrderedKeyIndices> __iChunk;"),
                        // The optional ii chunk
                        CodeGenerator.optional("needsIIChunk", "private final WritableLongChunk<Attributes.OrderedKeyIndices> __iiChunk;"),
                        // fields
                        CodeGenerator.repeated("defineField", "private final ColumnSource.GetContext __subContext[[COL_SOURCE_NAME]];"),
                        // constructor
                        "FormulaFillContext(int __chunkCapacity)", CodeGenerator.block(
                                CodeGenerator.optional("needsIChunk", "__iChunk = WritableIntChunk.makeWritableChunk(__chunkCapacity);"),
                                CodeGenerator.optional("needsIIChunk", "__iiChunk = WritableLongChunk.makeWritableChunk(__chunkCapacity);"),
                                CodeGenerator.repeated("initField", "__subContext[[COL_SOURCE_NAME]] = [[COL_SOURCE_NAME]].makeGetContext(__chunkCapacity);")
                        ),
                        "",
                        "@Override",
                        "public void close()", CodeGenerator.block(
                                CodeGenerator.optional("needsIChunk", "__iChunk.close();"),
                                CodeGenerator.optional("needsIIChunk", "__iiChunk.close();"),
                                CodeGenerator.repeated("closeField", "__subContext[[COL_SOURCE_NAME]].close();")
                        )
                )
        );
        g.replace("FILL_CONTEXT_CANONICAL", Formula.FillContext.class.getCanonicalName());
        if (usesI) {
            g.activateAllOptionals("needsIChunk");
        }
        if (usesII) {
            g.activateAllOptionals("needsIIChunk");
        }
        visitFormulaParameters(null,
                cs -> {
                    final CodeGenerator defineField = g.instantiateNewRepeated("defineField");
                    final CodeGenerator initField = g.instantiateNewRepeated("initField");
                    final CodeGenerator closeField = g.instantiateNewRepeated("closeField");
                    defineField.replace("COL_SOURCE_NAME", cs.name);
                    initField.replace("COL_SOURCE_NAME", cs.name);
                    closeField.replace("COL_SOURCE_NAME", cs.name);
                    return null;
                }, null, null);
        return g.freeze();
    }

One possible output of this function is

    private class FormulaFillContext implements com.illumon.iris.db.v2.select.Formula.FillContext {
        private final WritableIntChunk<Attributes.OrderedKeyIndices> __iChunk;
        private final WritableLongChunk<Attributes.OrderedKeyIndices> __iiChunk;
        private final ColumnSource.GetContext __subContextII;
        private final ColumnSource.GetContext __subContextI;
        FormulaFillContext(int __chunkCapacity) {
            __iChunk = WritableIntChunk.makeWritableChunk(__chunkCapacity);
            __iiChunk = WritableLongChunk.makeWritableChunk(__chunkCapacity);
            __subContextII = II.makeGetContext(__chunkCapacity);
            __subContextI = I.makeGetContext(__chunkCapacity);
        }

        @Override
        public void close() {
            __iChunk.close();
            __iiChunk.close();
            __subContextII.close();
            __subContextI.close();
        }
    }

Things to notice:

The general pattern here is to have a "template" (near the top of the function) with metavariables (for substitution), optional clauses (or optional subgenerators), and repeated clauses (or subgenerators). The behavior of those items are controlled by code at the bottom which may set, enable/disable, or instantiate a new clone of a repeated item.

The template is defined with nested function calls, so that the template code indents really nicely in the source code but without using spaces inside literal strings. See (e.g. Codegenerator.block) above. The advantage of this is we leverage the auto-identing that the IDE is already doing for us at the source code leve. This is an advantage over JavaPoet, where the source code looks "flat"... the logical indentation is not preserved. Notice that the library handles all indentation... there is no place where you would want to put (N blank spaces) at the start of any literal string. If you compose a generator inside another generator, the indentation will be adjusted accordingly.

The nice thing about this is you (as a programmer / code reviewer) can read the "template" in one place and get a pretty good idea of what the function being written is trying to do.

Also the composability story is pretty convenient. For example, this is my code that builds the whole class: as you can see it simply composes generators from subgenerators:

    @NotNull
    String generateClassBody() {
        if (params == null) {
            params = QueryScope.getDefaultInstance().getParams(userParams);
        }

        final TypeAnalyzer ta = TypeAnalyzer.create(returnedType);

        final CodeGenerator g = CodeGenerator.create(
                QueryLibrary.getImportStatement(), "",
                "public class $CLASSNAME$ extends [[FORMULA_CLASS_NAME]]", CodeGenerator.block(
                        generateFormulaFactoryLambda(), "",
                        CodeGenerator.repeated("instanceVar", "private final [[TYPE]] [[NAME]];"),
                        "private final Map<Object, Object> [[LAZY_RESULT_CACHE_NAME]];",
                        timeInstanceVariables, "",
                        generateConstructor(), "",
                        generateAppropriateGetMethod(ta, false), "",
                        generateAppropriateGetMethod(ta, true), "",
                        generateOptionalObjectGetMethod(ta, false),
                        generateOptionalObjectGetMethod(ta, true),
                        generateGetChunkType(ta), "",
                        generateFillChunk(false), "",
                        generateFillChunk(true), "",
                        generateFillChunkHelper(ta), "",
                        generateApplyFormulaPerItem(ta), "",
                        generateMakeFillContext(), "",
                        generateNormalContextClass(), "",
                        generateIntSize()
                ),
                ""
        );
        g.replace("FORMULA_CLASS_NAME", Formula.class.getCanonicalName());
        g.replace("LAZY_RESULT_CACHE_NAME", LAZY_RESULT_CACHE_NAME);
        visitFormulaParameters(null,
                cs -> {
                    final CodeGenerator fc = g.instantiateNewRepeated("instanceVar");
                    fc.replace("TYPE", cs.columnSourceGetTypeString);
                    fc.replace("NAME", cs.name);
                    return null;
                },
                ca -> {
                    final CodeGenerator fc = g.instantiateNewRepeated("instanceVar");
                    fc.replace("TYPE", ca.dbArrayTypeString);
                    fc.replace("NAME", ca.name);
                    return null;
                },
                p -> {
                    final CodeGenerator fc = g.instantiateNewRepeated("instanceVar");
                    fc.replace("TYPE", p.typeString);
                    fc.replace("NAME", p.name);
                    return null;
                });
        return g.build();
    }

As a comparison to JavaPoet, I think JavaPoet does a better job of being more "formal" with the language semantics, whereas my library is really just munging string blocks. On the other hand, I think the operations in my library (composability, optional blocks, repeated blocks, and having the source code kind of look like the result) make it pretty useful for our purposes. I also think my library tends to be more readable when you are doing something complicated. This readability comes from the fact that you write your code template and then you sub in, enable, disable, or instantiate pieces according to your logic.

One way this really helps is when you have conditional code. If you look at the first template I posted above, you can see that the needsIChunk functionality is distributed over three different places in the code: an area where it is defined, an area where it is used, and an area where it is closed. These three areas are defined by a CodeGenerator.optional() directive in the template, and activated by code like this

         if (usesI) {
            g.activateAllOptionals("needsIChunk");
        }

Similarly, there is code for a variable number of fields: for each field we need a clause to define the field, to init the field, and to close the field. These code blocks were introduced by g.instantiateNewRepeated() and instantiated by code like this (running once for each new field)

   final CodeGenerator defineField = g.instantiateNewRepeated("defineField");
   final CodeGenerator initField = g.instantiateNewRepeated("initField");
   final CodeGenerator closeField = g.instantiateNewRepeated("closeField");
   defineField.replace("COL_SOURCE_NAME", cs.name);
   initField.replace("COL_SOURCE_NAME", cs.name);
   closeField.replace("COL_SOURCE_NAME", cs.name);

The corresponding code in JavaPoet would be much more low-level and the "if" test would be repeated in three different places; the logical relationship between the pieces would be more obscure than it is with my library.

To see it in action in a rather complicated code generation scenario (used for formula generation in production in our system), check out com.illumon.iris.db.v2.select.FormulaSample (output) generated by com.illumon.iris.db.v2.select.DhFormulaColumn triggered by com.illumon.iris.db.v2.select.TestFormulaColumnGeneration.generateFiles()

I realize this was a "throw you in the deep end" kind of explanation but I'm very available to explain it further, help out with examples, fix bugs, extend the library, that kind of thing.

chipkent commented 3 years ago

We may want to consider https://github.com/square/javapoet if we think the need to generate will be more than a one-off.

I looked at javapoet. For this use-case, javapoet will yield very hard to manage and modify code. It is very low level. The solution I arrived at is more simple and easy to manage.

chipkent commented 3 years ago

@kosak The implementation I'm going to hand you to review is not using your code generation. I'm doing what I think is more simple and maybe something you want to consider for some use cases. I am very open to discussions about how the generation is done and am open to changes if it makes sense. In my case, I am using a java equivalent of golang templates.

chipkent commented 3 years ago

https://reviewboard.eng.illumon.com/r/3651/

deephaven / deephaven-core

Rewrite primitives library and related code generation #72