MilesEngineering / MsgTools

Message Tools for embedded systems
Other
12 stars 7 forks source link

Code generator refactoring #66

Open mgazic opened 6 years ago

mgazic commented 6 years ago

If code generator were changed to have <FOREACHMSG></FOREACHMSG>, FOREACHFIELD, FOREACHENUM(FOREACHOPTION), with <> tags for field and enum metadata, all of the functionality in each language's language.py plugin could be embedded inline in the language's template file. This would be easier to read and maintain, and make it easier for people who aren't python experts and aren't familiar with the code generator to add new languages or make their own variation on a language.

Detailed examples follow:

<FOREACHEFIELD>
    static const int <NAME>_Loc   = <LOCATION>;
    static const <TYPE> <NAME>_Max   = <MAX>;
    static const <TYPE> <NAME>_Min   = <MIN>;
    static const String <NAME>_Units = '<UNITS>';
    static const int Count = 1;
</FOREACHFIELD>

<FIELDTYPEHASH(x)> creates a new tag that gives a dictionary lookup with key fieldtype of whatever was specified in FIELDTYPEHASH's dictionary also creates and <!HASx> that say whether the given field's key is in the hash table

for C/C++:

<FIELDTYPEHASH(TYPE)>uint8: uint8_t, uint16: uint16_t, uint32: uint32_t, uint64: uint64_t, int8: int8_t, int16: int16_t, int32: int32_t, int64: int64_t, float32: float, float64: double</FIELDTYPEHASH>
<FIELDTYPEHASH(ACCESSTYPE)>uint8: Uint8, uint16: Uint16, uint32: Uint32, uint64: Uint64, int8: Int8, int16: Int16, int32: Int32, int64: Int64, float32: Float32, float64: Float64</FIELDTYPEHASH>
<FIELDTYPEHASH(REFLTYPE)>uint8: uint, uint16: uint, uint32: uint, uint64: uint, int8: int, int16: int, int32: int, int64: int, float32: float, float64: float</FIELDTYPEHASH>

for javascript: <FIELDTYPEHASH(TYPE)>uint8: Uint8, uint16: Uint16, uint32: Uint32, uint64: Uint64, int8: Int8, int16: Int16, int32: Int32, int64: Int64, float32: Float32, float64: Float64</FIELDTYPES> for python: <FIELDTYPEHASH(TYPE)>uint64:">Q", uint32:">L", uint16: ">H", uint8: "B", int64:">q", int32:">l", int16: ">h", int8: "b", float64:">d", float32:">f"</FIELDTYPEHASH> for java:

<TYPE>int, short, long. 1 size larger for unsigned, to hold sign bit in upper half.  stupid java doesnt have unsigned
<ACCESSTYPE>long, int, short, ""
<CASTTYPE>long, int, short, byte
<PROMOTIONFN>uint64:error, uint32:FieldAccess.toUnsignedLong, uint16: FieldAccess.toUnsignedInt, uint8: FieldAccess.toUnsignedInt

this is a dictionary that may include any of unsigned, signed, float, double, long. anything not specified has no suffix appended. if two match (unsigned long?), what should be done? <CONSTSUFFIXES>unsigned: u, float: f, long: l</CONSTSUFFIXES>

this iterates over all fields of a message, including fields and bitfields. many tags like <MIN>, <MAX>, <DEFAULT>, <DESCRIPTION>, <UNITS>, <LOCATION>, <COUNT>, <SIZE> exist for all fields

some like <SCALE>, <OFFSET>, <SHIFT>, <MASK>, <NUMBITS>, <ENUM>only exist for scaled fields, bitfields, or enums. conditionals like <IFBITFIELD>, <IFSCALED>, <IFENUM>, <IFARRAY> and their converse <!IFBITFIELD>, <!IFSCALED>, <!IFENUM>, <!IFARRAY> will leave out a line if they evaluate to false conditionals will apply until end-of-line, and don't have closing tab? there's also a parameterized version of each <IFx> like <IFx(blah)> that inserts text inline if condition is true

<FOREACHEFIELD>
    /* <DESC>, (<MIN> to <MAX>)*/
    <TYPE> Get<NAME>(<IFARRAY(int index)>)
    {
<!IFBITFIELD>    <TYPE> val = _data.get<ACCESSTYPE>(<LOC><IFARRAY(+index*<SIZE>)>);
<IFBITFIELD >    <TYPE> val = Get<PARENTNAME>(<IFARRAY(index*<SIZE>)>);
<IFBITFIELD >    val = (val & <MASK>) << <SHIFT>;
<IFSCALED   >    return val * <SCALE> + <OFFSET>;
<!IFSCALED  >    return <IFENUM((<ENUM>))>val;
    }
<IFSTRING>    String Get<NAME>String()
<IFSTRING>    {
<IFSTRING>      String ret = "";
<IFSTRING>      for(int i=0; i<msg.hdr.GetDataLength() - <LOCATION>; i++)
<IFSTRING>      {
<IFSTRING>          ret += Get<NAME>(i);
<IFSTRING>      }
<IFSTRING>      return ret;
<IFSTRING>    }
</FOREACHFIELD>
    static MsgInfo* ReflectionInfo()
    {
        static bool firstTime = true;
        static MsgInfo msgInfo(MSG_ID, "<MSGNAME>", "<MSGDESC>", MSG_SIZE);
        if(firstTime)
        {
            firstTime = false;
<FOREACHEFIELD>
<!IFBITFIELD>msgInfo.AddField(new <REFLTYPE>FieldInfo("<NAME>", "<DESC>", "<UNITS>", <LOCATION>, <SIZE>, <COUNT>));
<IFBITFIELD ><!IFSCALED>msgInfo.AddField(new BitFieldInfo("<NAME>", "<DESC>", "<UNITS>", <LOCATION>, <SIZE>, <COUNT>, <SHIFT>, <MASK>));
<IFBITFIELD ><IFSCALED >msgInfo.AddField(new ScaledBitFieldInfo("<NAME>", "<DESC>", "<UNITS>", <LOCATION>, <SIZE>, <COUNT>, <SHIFT>, <MASK>, <SCALE>, <OFFSET>));
</FOREACHFIELD>
        }
        return &msgInfo;
    }

    static void Init()
    {
<FOREACHFIELD>
<IFHASDEFAULT><!IFARRAY>        Set<FIELDNAME>(<FIELDDEFAULT>));
<IFHASDEFAULT><IFARRAY >        for(int i=0; i<<FIELDCOUNT>; i++)
<IFHASDEFAULT><IFARRAY >            Set<FIELDNAME>(<FIELDDEFAULT>), i);
</FOREACHFIELD>
    }

how to leave comma off last option/value pair? <CHOMP> could delete one character, need to make sure it doesn't just get a whitespace trailing formatting character....

<FOREACHENUM>
enum <ENUM> { 
    <FOREACHOPTION><OPTION> : <VALUE>,</FOREACHOPTION><CHOMP>
    }
</FOREACHENUM>

what needs to remain in language.py for each language? special code for <GETMSGID> <SETMSGID> in header class could be done with <FOREACHIDFIELD>, with <IFID>?

how to handle namespacing? have multiple options for:

    <MSGNAME>:          _ delimited with full hierarchy
    <MSGSHORTNAME>:     last element, no hierarchy
    <MESSAGE_PACKAGE>:  . delimited hierarchy without last element
    <MSGPATH>           _ delimited hierarchy without last element
    <MSGPATH(d)>        d delimited hierarchy without last element

should replacement tags have <MSG and <FIELD at start? or should it just be whatever followed that, assuming MSG generally and FIELD inside a FOREACHFIELD? should MSG be an optional prefix when inside FOREACHFIELD, to get the MSG? should PARENT be an optional prefix when inside FOREACHFIELD to get the parent field of a bitfield?

should we have <FOREACHMSG>, and generally wrap nearly entire template file in that except for header comment block and import statements?

how to handle languages that need a separate output file for each message (matlab, java)? <NEWFILE> tag in template file, like so:

        <FOREACHMSG>
        <NEWFILE> # fixed filename, or parameterized?  matlab needs + sign on all directories
        # header comment block
        import 'blah'
        .... message stuff ....
        </FOREACHMSG>
    <NEWFILE(MSGPATH(+,/)/MSGSHORTNAME.m> +messages/+taconomy/+canidae/dog.m
    <NEWFILE(MSGPATH(,/)/MSGSHORTNAME.java> messages/taxonomy/canidae/dog.java
    <NEWFILE(<MSGNAME>.c)> all c files in one directory, with filename equal to _ delimited hierachy and message name

could add new features for <FOREACHMSG> <FOREACHFIELD> to general parser, and transition languages to use them where applicable. once languages have eliminated their need for their own language.py, it can be deleted, and parser can have option for language name NONE?

mgazic commented 6 years ago

Also consider implementing <ISHEADER> and <!ISHEADER>, so there can be a single template for messages and headers, with sections for headers vs. normal messages as needed.

mosminerCP commented 6 years ago

Would it make sense to consider Jinja2 as part of this effort?

Miles-Gazic-CardinalPeak commented 6 years ago

Yes, definitely we should consider Jinja2! I've only read about Jinja, haven't tried using it, but it sounds perfect for this.

I think we probably need to consider at least temporary backwards compatibility. One way to do that would be to make msgparser support both modes, and have it decide which to use based on the command-line args passed to it. Once we're confident everyone is using the new jinja templates we can refactor to remove the old template/plugin functionality.

mgazic commented 6 years ago

To use jinja for this, perhaps we can make a new msgparser language plugin called jinja. It will take jinja files as template files, but otherwise be invoked like existing plugins and require no changes to msgparser. Then we'd be able to do: msgparser messages obj/CodeGenerator/C jinja -t ctemplate.h msgparser messages obj/CodeGenerator/Python jinja -t pythontemplate.py msgparser messages obj/CodeGenerator/Javascript jinja -t javascripttemplate.js

This will allow work on the base jinja msgparser implementation to proceed in master without risk of breaking existing language support. Once jinja templates have been written for all languages, and regression tests have been done, we can get rid of support for all languages except jinja and remove the language plugin abstraction.

mosminerCP commented 6 years ago

I was thinking of adding a pre-processing step that searches the template for old template tags, and if found uses the current template mechanism. Otherwise assume Jinja and use that templating engine. The idea is to make backward compatibility opaque to current users; it just works.

I doubt it would take that much to make the both templating approaches a plugin so it's easier to deprecate the old template engine later. The idea was to refactor the existing code a bit to abstract the template engine from the code generator to make it easy to deprecate the old template mechanism when the time is right. Ultimately that should directly support a plugin model.

I'm fuzzy on if extra work would be required to support the plugin approach. If the plugin adds extra work and complexity I'd be tempted to just hard code in the template engines for now. It's unlikely we'd switch template engines again anytime soon, and if we do there would still be a good abstraction model in place to simplify adding another engine.

One other feature to consider with Jinja is user provided tags. MsgTools will no doubt support a set of built-in tags for code generation. It's near trivial to allow the user to supply a command line argument that defines additional tags, or a filename with additional tags. Could be a nice customization if a user wanted to provide a custom file header or something.

mgazic commented 6 years ago

I've got the start of a jinja plugin using the existing architecture (with minimal changes to msgtools/parser/parser.py) here: https://github.com/MilesEngineering/MsgTools/compare/jinja

The path I'm going down is to have only one msgparser plugin for jinja that would be used for generating code in all languages. All language-specific features would be done by the jinja template file for that language, using the more expressive jinja templating syntax. I don't know how well that will work, though: the first attempt at a jinja template for C is mostly working, it's just a bit uglier and not as user friendly as I hoped.

mosminerCP commented 6 years ago

I see where I'm off in the weeds. It helps when I actually read. Given you're considering Jinja as a replacement for the code generation logic in language.py as well, I'm not surprised the new template is uglier. Jinja's syntax isn't as compact as a true programming language. You're not going to get around Jinja's syntax, but you might be able to compartmentalize it via macros and separate templates.

Much like language.py you could define a "template" for each language with all the macros needed to generate getters/setters etc. Then you can provide a global message template that uses the macros to generate each message. A sort of master message spec.

It won't make the individual language templates any prettier, but at least you have a global level message "spec" for all languages that should be relatively clean since it's just invoking macros. I suspect rolling your own tags will let you implement more streamlined syntax, but you'll lose all the flexibility, features, and documentation provided by Jinja. Which might be the right trade-off.

Another option might be to basically do what MsgTools does now - leave the core language constructs in Python, and stick with a simple message template. You could do that with Jinja extensions. Maybe in process you can push a little more out of Python into Jinja to thin out the Python code to make it easier to customize in Jinja? I'm not sure if there is opportunity to do that or not. I don't have the deeper appreciation you do for all the languages.