Closed opringle closed 5 years ago
Hi @opringle,
Thanks for your feedback.
The activation functions (e.g ReLU) are pretty well mentioned throughout the papers. AFAIK, the first Temp Conv, 64 layer act as n-grams generator, followed by the Convolutional block.
Thus the first layers of the network are as follows:
Seeing back to back Convolutions at the early stage of the network did not surprised me more than that as I thought it would help crafting more features (Convolutions) before selecting them (ReLU, Pooling, etc...)
Do you manage to replicate the results with your implementation of VDCNN ?
Cheers !
Should there not be an activation function after the first 3, Temp Conv, 64 layer? The paper does not specify but I assumed every convolutional layer should be followed by batch normalization + relu in my own implementation.