Open AlexDBlack opened 5 years ago
Hi Alex @AlexDBlack
Attached is the test code and pom files I am using to test the performance of ResNet50. Test code is in PretrainedClassification.java. There are 2 things I did in this code as you will see in the comments.
Approach 1: Verified I was getting same results with SqueezeNet and ResNet50 with pretrained SqueezeNet, ResNet50 zoo models with only 1 static image (you can ignore this part of the code).
Approach 2: Get the performance results of the actual target under consideration ResNet50. I used warm-up cycles too and ignored that data. After that, I took the actual measurements per batch and I reported the average time per image from the batches (on gitter). I used batch size of 512, and used about 2500 images from imagenet data (link in the code itself).
System Details:
1) Ubuntu 18.04
2) 24c Xeon system (2 sockets system with 24c each, but pinned the JVM to 1 socket(24c) using numactl i.e. numactl -m 0 -N 0, verified it is actually using 1 socket using htop)
3) Used Out of Box frequency
4) JVM options:
-Xms29g -Xmx29g
-XX:+UseG1GC
-XX:ParallelGCThreads=1
5) Other Environment options and perf top info: as in https://gist.github.com/dollarHome/b66dd82f5443ad9205abf901c6670dd2
KMP_BLOCK_TIME=0
OMP_WAIT_POLICY=PASSIVE
MKL_THREADING_LAYER=GNU
MKL_NUM_THREADS=24
OMP_DISPLAY_ENV=VERBOSE
OMP_NUM_THREADS=24
Unconfirmed/not yet reproduced, as reported in gitter:
Earlier benchmarks we ran on ResNet50 CPU suggested performance is better for training on snapshots than it was for beta3. Some possibilities (a) Performance has regressed again (b) Training is faster, inference is slower (c) Relative performance (beta3/snapshots) is hardware dependent
Aha! Link: https://skymindai.aha.io/features/DL4J-6