NOTE differences between Torch and TensorFlow

benchopt / benchmark_resnet_classif

Benchopt benchmark for ResNet fitting on a classification task

8 stars 4 forks source link

Open zaccharieramzi opened 2 years ago

zaccharieramzi commented 2 years ago

ResNet implem:

stride > 2 (can be corrected) is done a bit different
init of weights is different (for conv and dense)
Empirically: SGD works much better for PyTorch (reaching 95.5%).
Batch norm seems to be implemented differently for the 2 frameworks: it causes a difference in loss at training time. But we can't verify that when comparing the batch norm implem in isolation.

zaccharieramzi commented 2 years ago

Batch norm is indeed implemented differently. See the unbiased/biased thing in the paper.

Coupled weight decay not being applied correctly in TF was the reason behind it not performing as well.