Input layer missing activation function

Hi @opringle,

Thanks for your feedback.

The activation functions (e.g ReLU) are pretty well mentioned throughout the papers. AFAIK, the first Temp Conv, 64 layer act as n-grams generator, followed by the Convolutional block.

Thus the first layers of the network are as follows:

3, Temp Conv, 64
3, Temp Conv, 256
Temporal Batch Norm
ReLU

Seeing back to back Convolutions at the early stage of the network did not surprised me more than that as I thought it would help crafting more features (Convolutions) before selecting them (ReLU, Pooling, etc...)

Do you manage to replicate the results with your implementation of VDCNN ?

Cheers !

ArdalanM / nlp-benchmarks

Input layer missing activation function #3