Closed apaszke closed 7 years ago
The most straightforward changes that remove some STL containers from Eval::getSubgraph
cause the runtime to drop to 39ms for forward and 85ms for backward. And I still have things I can optimize, so I expect it to get even better!
Maybe simple Eval helps so much just because it doesn't perform any allocations for the graph traversal. We've been using 8 STL containers, and most of them were sets (which allocate a separate node for each pointer they hold :blush:).
I really don't think the DFS will be causing us any problems. We're doing it every time it runs the engine, but it always shows up to have a marginal overhead in the profiler (especially that we DFS over small hot parts of the graph that are very likely to be in the cache).
LGTM, but tests need fixing.
Seems good to me. I would need more background in the current system to have more substantive comments, but go ahead and merge once tests pass.
For first backward, the Eval wraps only a single function, and we can save a lot of allocations and cycles by skipping initialization of all data structures we need to perform the graphs search for a "general case".
This is one approach to do it, but now I think it might be even simpler to just provide another
replaceSubgraph
overload, that would do no graph traversal and would be used only by forward functions (which is a bit awkward, because we don't always know what C++ functions are forward/backward; I'm not sure if adding yet another virtual method is a good idea).Word language model run times: