Closed XuhuiZhou closed 6 years ago
Hi Edaward,
The reason we use Dropout2D instead of Dropout is that we want to drop the whole token (word, pos, ...). It means that if we drop the token, we drop the whole embedding of it rather than some neurons of it. The reason I did the transpose before dropout is that I made sure that the dimension I want to drop is at the desired position.
In the neuronlp2/models/parsing.py/BiRecurrentConvBiAffine, you use Dropout2d as your drop out layer. Could you please explain why you choose that? So let's say the output = [32, 35, 512], you first transpose(1,2) and then do the dropout, and then transpose it back. Could you please tell me why you do the transpose before dropout?