DL4J: Possible ResNet50 Inference Regression - beta3 to snapshots? (CPU)

Hi Alex @AlexDBlack

Attached is the test code and pom files I am using to test the performance of ResNet50. Test code is in PretrainedClassification.java. There are 2 things I did in this code as you will see in the comments.

Approach 1: Verified I was getting same results with SqueezeNet and ResNet50 with pretrained SqueezeNet, ResNet50 zoo models with only 1 static image (you can ignore this part of the code).

Approach 2: Get the performance results of the actual target under consideration ResNet50. I used warm-up cycles too and ignored that data. After that, I took the actual measurements per batch and I reported the average time per image from the batches (on gitter). I used batch size of 512, and used about 2500 images from imagenet data (link in the code itself).

System Details: 1) Ubuntu 18.04 2) 24c Xeon system (2 sockets system with 24c each, but pinned the JVM to 1 socket(24c) using numactl i.e. numactl -m 0 -N 0, verified it is actually using 1 socket using htop) 3) Used Out of Box frequency 4) JVM options:
-Xms29g -Xmx29g -XX:+UseG1GC -XX:ParallelGCThreads=1 5) Other Environment options and perf top info: as in https://gist.github.com/dollarHome/b66dd82f5443ad9205abf901c6670dd2 KMP_BLOCK_TIME=0 OMP_WAIT_POLICY=PASSIVE MKL_THREADING_LAYER=GNU MKL_NUM_THREADS=24 OMP_DISPLAY_ENV=VERBOSE OMP_NUM_THREADS=24

PretrianedClassificationCode.zip

deeplearning4j / deeplearning4j

DL4J: Possible ResNet50 Inference Regression - beta3 to snapshots? (CPU) #7271