Performance issues when loading include workflows

glopesdev commented 7 months ago

A growing issue as we use more and more embedded resources is a performance penalty when accessing the first embedded resource in a newly loaded assembly. This penalty seems to be paid only once when the resource is first loaded, but is paid for every single assembly with embedded resources.

The first step here would be to investigate the performance penalty further with benchmarks on both .NET framework and .NET core, and with different types of resources and different numbers of assemblies and resources loaded (e.g. does an assembly with 1 embedded resource pay a smaller cost than an assembly with 10 embedded resources?).

glopesdev commented 2 months ago

After further investigation, it turns out the cost is born out of an interaction between partial loading of include workflows and the creation of multiple XmlSerializer instances. Loading a complex workflow with multiple included workflows referencing multiple packages exacerbates the problem the most. Benchmarking performance of operator insertion aligned to XmlSerializer creation reveals the cause of the performance bottleneck:

Creating new XML serializer... 0
Creating new XML serializer... 1
Time to insert Aeon.Acquisition:TimestampGenerator.bonsai: 223 ms
Creating new XML serializer... 2
Time to insert Aeon.Acquisition:CameraController.bonsai: 241 ms
Creating new XML serializer... 3
Time to insert Aeon.Acquisition:SpinnakerVideoSource.bonsai: 327 ms
Time to insert Aeon.Acquisition:SpinnakerVideoSource.bonsai: 0 ms
Time to insert Aeon.Acquisition:SpinnakerVideoSource.bonsai: 0 ms
Time to insert Aeon.Acquisition:SpinnakerVideoSource.bonsai: 0 ms
Time to insert Aeon.Acquisition:SpinnakerVideoSource.bonsai: 0 ms
Time to insert Aeon.Acquisition:SpinnakerVideoSource.bonsai: 0 ms
Time to insert Aeon.Acquisition:SpinnakerVideoSource.bonsai: 0 ms
Time to insert Aeon.Acquisition:SpinnakerVideoSource.bonsai: 0 ms
Time to insert Aeon.Acquisition:SpinnakerVideoSource.bonsai: 0 ms
Creating new XML serializer... 4
Time to insert Aeon.Acquisition:AudioSource.bonsai: 297 ms
Creating new XML serializer... 5
Time to insert Aeon.Foraging:UndergroundFeeder.bonsai: 442 ms
Creating new XML serializer... 6
Time to insert Aeon.Acquisition:TimedRetryCount.bonsai: 388 ms
Time to insert Aeon.Foraging:UndergroundFeeder.bonsai: 31 ms
Time to insert Aeon.Acquisition:TimedRetryCount.bonsai: 0 ms
Time to insert Aeon.Foraging:UndergroundFeeder.bonsai: 3 ms
Time to insert Aeon.Acquisition:TimedRetryCount.bonsai: 0 ms
Creating new XML serializer... 7
Time to insert Aeon.Environment:WeightScale.bonsai: 481 ms
Creating new XML serializer... 8
Time to insert Aeon.Environment:RfidReader.bonsai: 439 ms
Time to insert Aeon.Environment:RfidReader.bonsai: 1 ms

The reason why multiple XmlSerializer instances are required in the first place is the plugin nature of language packages, which makes it impossible to know beforehand the exact list of types in a workflow. New XmlSerializer instances are created on-demand when new types are registered, either at workflow loading or saving time.

Type registration is batched so that all types in the workflow are collected at once, but because IncludeWorkflow instances separately load workflows from multiple files, the types in those files are only known when the included workflow is actually processed. Therefore, the current serialization strategy is not able to pre-load all required types at once when deserializing the top-level root workflow, and instead has to wait for new types in each included workflow to trickle down as each file is scanned and deserialized.

This effect will be exacerbated in workflows containing multiple included diverse workflows, each referencing very different types, e.g. one workflow with 100 includes of the same file will create only 1 extra serializer, whereas 100 includes of 50 different files can potentially create up to 50 extra serializers (assuming there is at least one non-overlapping type across the 50 different files).

Possible solutions

Pre-loading of include workflows during type discovery

Each loaded workflow file is pre-scanned for extension types (see WorkflowBuilder). Currently the search is restricted to scanning types from the same file being loaded. This could be extended to pre-emptively open and scan types from every included workflow in the file, in depth-first fashion.

Care might be required to avoid excessive recursive inspection of workflows when actually loading the included workflows (which will be themselves scanned recursively). One option might be to use memoization and store a table of already inspected workflows (invalidated at the end of the load call). However, if the main bottleneck is file IO this might not be very important as the file system will already cache file contents for repeated read access.

An entirely different approach would be to have all include workflows loaded recursively together with the main workflow, but this is likely to require significant refactoring of the current infrastructure.

Benchmarks should be taken into consideration to decide between the different options.

Explicit pre-loading of types to XmlSerializer

Another potentially more straightforward alternative would be to expose a new API to pre-load extension types into the XmlSerializer, and assume a worst case scenario by having the editor immediately pre-load all available extension types in the current environment at initialization time. Because extension types are locked at editor initialization, this would have the advantage of potentially ensuring only a single XML serializer is ever created.

On the other hand, it might have the significant downside of ramping up IDE initialization latency, and also would not resolve initialization lag when running workflows from the CLI, where no explicit scanning for types is performed.

glopesdev commented 2 months ago

It turns out the above is only half of the story. There is another performance bottleneck in include workflows related to serialization of the values in externalized properties. Specifically, because we separately serialize all the values of externalized properties as XML elements inside the include workflow itself, we need to create XmlSerializer instances for these property types, which are different from the XmlSerializer instance that serializes the workflow file itself.

https://github.com/bonsai-rx/bonsai/blob/130de6fecd9b3a30c695436d15c6e3cda300072d/Bonsai.Core/Expressions/IncludeWorkflowBuilder.cs#L427

Worse, because the root of the XML element for each property depends on the property name, it is not enough to have different XmlSerializer instances for each type. The current implementation naively creates an XmlSerializer with a different root element name for each combination of property name and property type, cached in a dictionary indexed by (string, Type) pairs. This results in a combinatorial explosion of XmlSerializer instances which grows worse the more combinations of types and names we have externalized as parameters.

There are two (possibly complementary) approaches to removing this bottleneck:

Remove the combinatorial explosion of XmlSerializer instances by creating only serializers for different types, and reworking the name of the root element by setting the XElement.Name property so that all serialization / deserialization goes through the same element name, e.g. Property.
Pre-emptively load all include workflows to find out their property types. Investigate options for merging all property serializers into a single serializer.

Doing 1. seems more accessible right now since all properties are already serialized independently and there is always a XElement instance on hand. Changing the local element name is simple:

element.Name = XName.Get(SerializerPropertyName, element.Name.NamespaceName);

glopesdev commented 2 months ago

Preliminary benchmarks implementing both serializer caching approaches above (eager loading of include workflow types and removing combinatorial explosion of property serializers) indicate a possibly more than 10x performance increase in reading a complex workflow, from 30 seconds down to less than 1 second.

bonsai-rx / bonsai