core_gather.cu is a memory-efficient version that expects log_probs with the shape (N, T, U, 2) only for blank and labels values. It shows excellent performance with a large vocabulary.
Hello, what dose the gather mean? Or when should I set gather to True? Does the log_probs shape (N, T, U, 2) means only have two classes, blank and whatever labels?
This flag decreases the memory allocation for the array of gradients. You can look at the implementation. For example, if you have a large output vocabulary V, I highly recommend using this flag.
Hello, what dose the gather mean? Or when should I set gather to True? Does the log_probs shape (N, T, U, 2) means only have two classes, blank and whatever labels?