I‘ve run your code on AG News dataset, and I get high accuracy in train step,but a relatively lower and unstable accuracy in test step. If I set the is_training=Ture in test step, I will get a good result, is there problems in the batch norm?
What is the use of fixed_padding after pooling layers, I did't see such an operation in the original paper.
I‘ve run your code on AG News dataset, and I get high accuracy in train step,but a relatively lower and unstable accuracy in test step. If I set the is_training=Ture in test step, I will get a good result, is there problems in the batch norm?
What is the use of fixed_padding after pooling layers, I did't see such an operation in the original paper.