Mojang / DataFixerUpper

A set of utilities designed for incremental building, merging and optimization of data transformations.
MIT License
1.2k stars 138 forks source link

Data fixer performance - flatten composition and restructure optimisation rule #72

Closed Gegy closed 1 year ago

Gegy commented 1 year ago

Follow-up to #71 with the "not easy bits" - the more structural / logic changes.

To copy words from #71:

In basic: the optimisation process works by applying various rules that restructure the produced datafixing functions based on the algebraic rules given by the point-free style of functions. Many of these rules are applied one-at-a-time (many(once(...)) rule), where a single node in the point-free tree is rewritten before packing up the new modified tree. This particularly means that the same logic runs many, many times on the same data.

Although the previous PR made changes that improved the performance of rewriting with the once() rule, this PR replaces those entirely with a new optimisation rule that applies every rule within a single traversal of the tree. To do this, this PR flattens point-free function compositions into a linear array instead trees of nested compositions. This removes the need for the association rewriting rules, which took up a large portion of time themselves. Additionally, due to how they restructured the tree, it was infeasible to apply rewrites "all at once" to the same degree as is done here.

Flattening composition is enabled/made easier by a change to encode Type information in each PointFree instance directly, instead of only encoding intermediate types. This means PointFreeRule no longer needs to accept a Type. There is some weird behaviour with creating View instances: some types generate Views with different types to the underlying function. This is used in cases such as tagged fields, which need to tag the final type such that the overall View can construct a correct Codec.

Results

With the latest changes to #71, I measure an average of 8s to build fixers on a single thread. With this PR, the average time is ~1.2s.

The majority of this time is now spent building the initial fixer function, as opposed to optimisation: image

Gegy commented 1 year ago

Force-pushed a change to the optimisation rule that is equivalent, but optimises everything in a single pass without the need to retry.

bagucode commented 1 year ago

I am still kind of swamped with s23 work so it's better if someone else reviews this instead of me after all. I think I can contribute better if I can do random small reviews instead. But I see we have one more already so maybe it's not even needed?