Open JoshVarty opened 5 years ago
INFO net.py: 259: data : (2, 3, 512, 896) => conv1 : (2, 64, 256, 448) ------- (op: Conv)
Args:[name: "kernel"i: 7, name: "order"s: "NCHW", name: "pad"i: 3, name: "stride"i: 2, name: "exhaustive_search"i: 0]
INFO net.py: 259: conv1 : (2, 64, 256, 448) => conv1 : (2, 64, 256, 448) ------- (op: AffineChannel)
INFO net.py: 259: conv1 : (2, 64, 256, 448) => conv1 : (2, 64, 256, 448) ------- (op: Relu)
Args:[name: "order"s: "NCHW", name: "cudnn_exhaustive_search"i: 0]
INFO net.py: 259: conv1 : (2, 64, 256, 448) => pool1 : (2, 64, 128, 224) ------- (op: MaxPool)
Args:[name: "order"s: "NCHW", name: "cudnn_exhaustive_search"i: 0, name: "kernel"i: 3, name: "pad"i: 1, name: "stride"i: 2]
✓(conv1): Conv2d(3, 64, kernel_size=(7, 7), stride=(2, 2), padding=(3, 3))
✓(affineChannel): AffineChannel()
✓(relu): ReLU()
✓(maxpool): MaxPool2d(kernel_size=3, stride=2, padding=1, dilation=1, ceil_mode=False)
Differences:
We're starting to get close but some differences remain. Currently my network occasionally gets exploding gradients near the start of training.
Let's start by taking a look at each model to ensure things look correct.
Since there's so much going on, we'll break it into different pieces and compare those one at a time.