Open NickLarsen opened 5 years ago
Impressive, that sizing things to lining up and fitting "cozily" within a page got a 20% perf improvement. But when it gets to these levels, small changes do add up quickly.
Thinking through the idea of "inverting execution" that I mentioned during today's stream (May 29th), what I trying to imagine was basically an implementation of the Visitor pattern; calling a method on the struct
which executes the visitor, which returns the next element which is to be evaluated, and goes on recursively to completion.
Again, I'll mention that I don't know how that might affect performance, nor memory paging by adding a method to the struct
.
Ultimately, I think that going the code generation route is most likely to succeed in getting the necessary results.
@hugodahl fixed the sigil, was pushing onto the stack in reverse order, super fast now
@hugodahl i cannot wait to show where we've gotten tomorrow on stream, at this point we could move to implementation details, but I want to try to squeeze out a little more perf before we do
After your first reply, I was curious with a penchant of "after the kids get to bed, I want to tinker with it". After that last message, I SO want to give it a go, but instead, I will save up the hype to catch up with the updates on stream. Cannot wait! And great find with the reversed order stack!!
@HugoDahl looks like your suspicion was valid, by cramming the decision tree node down to 8 bytes there was a big perf bump (450ms down to 370ms), so loading nodes is an issue. This sort of limits the size of the trees that can be used, but we can probably get away with it by just having two implementations, one generic and one specific for this optimization.
Shrunk the size by making the feature index a short, value a float and the branches as just bytes.