Closed ClashLuke closed 2 years ago
Currently, the performance improvements are marginal: One way this could be happening is that the model doesn't use the context of the depthwise block. To validate this is happening, I'll start another run without bottleneck block and QRNN.
Dilated convolution improves convergence (per step) a bit: Although it's also massively slower, so we might want to re-evaluate its context size: For the same wall-time, dilated convolution underperforms the dense one: