Usually max_pool2d is applied after each (relu) activation, but you apply max_pool2d before each activation. Did you try both ways and one worked better?
with max-pooling layer and ReLU the order does not matter. Both calculate to same thing. but relu(max_pool2d(conv2(x)) does it significantly faster by doing less number of operations
Usually max_pool2d is applied after each (relu) activation, but you apply max_pool2d before each activation. Did you try both ways and one worked better?