clab / dynet

DyNet: The Dynamic Neural Network Toolkit
Apache License 2.0
3.42k stars 704 forks source link

Feature Request: Operation Caching #579

Open neubig opened 7 years ago

neubig commented 7 years ago

One of the major computational and memory efficiency bottlenecks in DyNet code is when the same operation is called multiple times within a for loop. This code can be sped up and made significantly more memory efficiency by moving that repetitive call outside the for loop.

Unfortunately many people don't realize this. It would be nice if DyNet had the ability to detect when people were doing the same operation multiple times, and re-use the results if so. The only difficulty is how to make this caching fast enough that it doesn't impact people adversely when they are already using optimized code. Perhaps we can add it as a flag?

redpony commented 7 years ago

I think we can detect this case, we could just return an alias to the original node without warning the user since the overhead would be minimal (well, hopefully minimal). One simple way to do this: in addition to a node's batching signature, each candidate node would need to have a "what am i computing?" signature which would be computed recursively with a forward-like computation. We could even build in some logic for recognizing the identity of things like (x1 + x2) and (x2 + x1). Obviously there are limits- even something as simple as ((x1 + x3) - x2) vs ((x1 - x2) + x3) would be hard to detect.

We might be able to use this mechanism to handle parameter references (I think we have some special casing for that?).

On Wed, Jun 7, 2017 at 12:59 AM, Graham Neubig notifications@github.com wrote:

One of the major computational and memory efficiency bottlenecks in DyNet code is when the same operation is called multiple times within a for loop. This code can be sped up and made significantly more memory efficiency by moving that repetitive call outside the for loop.

Unfortunately many people don't realize this. It would be nice if DyNet had the ability to detect when people were doing the same operation multiple times, and re-use the results if so. The only difficulty is how to make this caching fast enough that it doesn't impact people adversely when they are already using optimized code. Perhaps we can add it as a flag?

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/clab/dynet/issues/579, or mute the thread https://github.com/notifications/unsubscribe-auth/AAJba8CLOYLKFlvy6A46ytgqsfV38wxNks5sBdm8gaJpZM4NyBNz .

neubig commented 7 years ago

Yeah, that's what I was thinking. I actually don't think we even need to calculate the signature recursively per say, as each node has access to the ID of its arguments, which can be used as unique identifiers. The idea of identifying identities is a good one. Anyway, if we're not too worried about efficiency this should be easy to hack up, so I'll try to do it.

And re: parameter references, yes, the Python code has special casing for this. I think basically if we implement the generalized operation caching capability properly it should subsume the special casing for parameters.

ming-wei-chang-zz commented 7 years ago

In ACL Vancouver, I was talking to Yoav about this, and he just pointed out that there is an open issue about it. I would be very happy if dynet can implement this.

I would like to point out the importance of node or operation caching for inference-based training (beam search, dynamic programming or other inference algorithms). I was doing my SQA project using inference based training algorithms, and found out that having auto caching can be very useful. Sometimes it could be tedious to write all caches externally and efficiently.