JoshVarty / pytorch-retinanet

Reproducing the Detectron implementation of RetinaNet
MIT License
0 stars 1 forks source link

Compare Models #6

Open JoshVarty opened 5 years ago

JoshVarty commented 5 years ago

We're starting to get close but some differences remain. Currently my network occasionally gets exploding gradients near the start of training.

Let's start by taking a look at each model to ensure things look correct.

Since there's so much going on, we'll break it into different pieces and compare those one at a time.

JoshVarty commented 5 years ago

Network Stem

Detectron

INFO net.py: 259: data                        : (2, 3, 512, 896)     => conv1                       : (2, 64, 256, 448)    ------- (op: Conv)
                Args:[name: "kernel"i: 7, name: "order"s: "NCHW", name: "pad"i: 3, name: "stride"i: 2, name: "exhaustive_search"i: 0]
INFO net.py: 259: conv1                       : (2, 64, 256, 448)    => conv1                       : (2, 64, 256, 448)    ------- (op: AffineChannel)
INFO net.py: 259: conv1                       : (2, 64, 256, 448)    => conv1                       : (2, 64, 256, 448)    ------- (op: Relu)
                Args:[name: "order"s: "NCHW", name: "cudnn_exhaustive_search"i: 0]
INFO net.py: 259: conv1                       : (2, 64, 256, 448)    => pool1                       : (2, 64, 128, 224)    ------- (op: MaxPool)
                Args:[name: "order"s: "NCHW", name: "cudnn_exhaustive_search"i: 0, name: "kernel"i: 3, name: "pad"i: 1, name: "stride"i: 2]

PyTorch

✓(conv1): Conv2d(3, 64, kernel_size=(7, 7), stride=(2, 2), padding=(3, 3))
✓(affineChannel): AffineChannel()
✓(relu): ReLU()
✓(maxpool): MaxPool2d(kernel_size=3, stride=2, padding=1, dilation=1, ceil_mode=False)

Differences: