Speedup inference model

This PR significantly speeds up the neural network part of inference. (It still doesn't address beam search which is coming in another PR.) The idea is a continuation of the previous PR as follows. After computing the graph embeddings, we choose k tactics, and compute arguments for those tactics. However, the available local and global context is independent from the chosen tactics. So this allows for three optimizations:

The keys (local and global context embeddings) can be computed before we do anything with tactics. They only need to have shape [batch, None(cxt), hdim] where batch is the original batch size (1 in our case).
The queries on the other hand, do depend on the tactics, but can still match them up with the keys, by listing all tactic-argument embeddings or all k tactics under the same batch element. The queries then have ragged shape [batch, None(tactic-args), hdim], where batch is just the original batch_size (in our case 1).
Finally, since the batch dimension is size one in our case, the arrays I mentioned above are not really ragged, so in that special case of batch_size=1, we can optimize the multiplication to just be regular matrix multiplication (with no ragged tensor issues).

These three optimizations, when combined, significantly speed up the model. Previously, most of the time was spent multiplying queries and keys to get the logits (or transforming tensors to get into and out of the right shape to multiply). This eliminates all that. And moreover, surprisingly to me, by reducing the shape of the key tensor, we also significantly speed up even the time spent on the einsum operator (even though we still multiply the same number of query-key pairs together).

IBM / graph2tac

Speedup inference model #146