ezyang / pytorch-unattached

Tensors and Dynamic neural networks in Python with strong GPU acceleration
http://pytorch.org
Other
20 stars 8 forks source link

Add simple mode to Eval #215

Closed apaszke closed 7 years ago

apaszke commented 7 years ago

For first backward, the Eval wraps only a single function, and we can save a lot of allocations and cycles by skipping initialization of all data structures we need to perform the graphs search for a "general case".

This is one approach to do it, but now I think it might be even simpler to just provide another replaceSubgraph overload, that would do no graph traversal and would be used only by forward functions (which is a bit awkward, because we don't always know what C++ functions are forward/backward; I'm not sure if adding yet another virtual method is a good idea).

Word language model run times:

forward backward
No JIT 40ms 72ms
JIT 49ms 105ms
JIT + simple Eval 37ms 75ms
apaszke commented 7 years ago

The most straightforward changes that remove some STL containers from Eval::getSubgraph cause the runtime to drop to 39ms for forward and 85ms for backward. And I still have things I can optimize, so I expect it to get even better!

Maybe simple Eval helps so much just because it doesn't perform any allocations for the graph traversal. We've been using 8 STL containers, and most of them were sets (which allocate a separate node for each pointer they hold :blush:).

apaszke commented 7 years ago

I really don't think the DFS will be causing us any problems. We're doing it every time it runs the engine, but it always shows up to have a marginal overhead in the profiler (especially that we DFS over small hot parts of the graph that are very likely to be in the cache).

ezyang commented 7 years ago

LGTM, but tests need fixing.

zdevito commented 7 years ago

Seems good to me. I would need more background in the current system to have more substantive comments, but go ahead and merge once tests pass.