brettviren / moo

ruminants on module oriented programming
GNU General Public License v3.0
4 stars 4 forks source link

Slow load for otypes #39

Open brettviren opened 1 year ago

brettviren commented 1 year ago

@plasorak reports that loading moo.otypes can take O(10s) for the DAQ.

While perhaps bearable, imo, this is really too slow for comfort. I'd want moo overhead to startup time to be a couple orders of magnitude smaller.

I suspect any slowness is due to the inherent slowness of the C++ version of the Jsonnet compiler which is used by the Python jsonnet package. There is also a Go version which produces a mostly compatible .so shared library andf or which the Jsonnet community have done substantial optimization.

In Wire-Cell Toolkit we have very large and complex Jsonnet and see x10-x100 speed up going from the C++ to the Go version. Wire-Cell has a compile-time option to select which version to build against. Unfortunately I do not know an equivalent when installing the jsonnet Python module eg via pip.

One check can done here. It is always possible to precompile the .jsonnet files to .json and then load those. This load should be as fast as Python's json can manage. Any left over slowness can be blamed on moo.otypes. This precompilation could be done with the Go version. Not a great permanent solution as now one hasto track both .jsonnet source and the .json artifacts.

Moving away from otypes to something new which produces .py files holding Python dataclasses or pydantic classes which are written by applying the Jsonnet schema to moo templates (like we do for C++) is also a workaround.

Doing this may also solve #38.

DAQ folks, feel free to add info/complaints on this or other topics.

plasorak commented 1 year ago

Ok, so I've given it a stab, because I think we'll need to have this at some point.

So, on my fork, in the plasorak/python-codegen branch, there are some sources that are able to generate python "headers" and code to (de)serialise raw dictionary. I've also tried to be a bit careful about the validation of the classes, so for example it checks multipleOf, maximumExclusive and string patterns/regex.

This certainly simplifies quite a bit the client code (see for example daqconf), because one now only needs to import to have the classe definitions, and I think it should make things faster (although I'll have to confess I haven't properly measured it).