Open Michael0x2a opened 7 years ago
Well, I attempted looking into using dynamically-resizing RNNs, and feeding in sequence length, and other such things, and I got basically the same results, except that each epoch took 3 times longer to run...
I'm a bit stumped as to how I'd apply a mask before computing loss, given that I'm comparing the softmax cross entropy between the output labels (and so a mask isn't applicable).
Ideas to explore:
Potential issues: