Surprisingly, it turns out that one of the optimizations of Automa actually makes the code run slower.
I spent like two days trying creating an elaborate codegen system to create optimal membership code here, but it somehow turned out slower. The only reason I can think that might be is that there are fewer memory dependencies in this code, such that, even though it does more operations, it can do more instructions per cycle.
Anyway, it's nice to be able to speed up code by deleting code.
Surprisingly, it turns out that one of the optimizations of Automa actually makes the code run slower.
I spent like two days trying creating an elaborate codegen system to create optimal membership code here, but it somehow turned out slower. The only reason I can think that might be is that there are fewer memory dependencies in this code, such that, even though it does more operations, it can do more instructions per cycle.
Anyway, it's nice to be able to speed up code by deleting code.