Alibaba-Gemini-Lab / OpenCheetah

189 stars 48 forks source link

Slow linear layer compared to non-linear layer #26

Closed hobbitlzy closed 1 year ago

hobbitlzy commented 1 year ago

Hello! Thank you for your outstanding work. I have been running the ResNet50 model following the instructions provided in the ReadMe file. While doing so, I noticed that the linear layer's latency is longer than that of the non-linear layer, which seems to contradict the expectation that linear layers would require fewer rounds and less data for communication.

Upon running the model, I obtained the following log data:

Total time in Conv = 175.196 seconds.
Total time in MatMul = 0.253 seconds.
Total time in BatchNorm = 8.453 seconds.
Total time in Truncation = 32.28 seconds.
Total time in Relu = 46.977 seconds.
Total time in MaxPool = 7.566 seconds.
Total time in AvgPool = 0.042 seconds.
Total time in ArgMax = 0.021 seconds.
Total time in MatAdd = 0 seconds.
Total time in MatAddBroadCast = 0 seconds.
Total time in MulCir = 0 seconds.
Total time in ScalarMul = 0 seconds.
Total time in Sigmoid = 0 seconds.
Total time in Tanh = 0 seconds.
Total time in Sqrt = 0 seconds.
Total time in NormaliseL2 = 0 seconds.

Based on these results, it appears that the linear layer is slower than the ReLU layer. However, I am unsure as to why this is the case. I suspect that the 4-thread CPU setting may be running slow for the linear layer and it is bottlenecked on the computation. Could you kindly provide some insight on this matter?

Thank you for your time and attention to this issue.

fionser commented 1 year ago
hobbitlzy commented 1 year ago

Thanks for your reply. So you mean the slowdown comes from both communication and computatio?. My device uses ARM CPU. Does this mean I cannot obtain the desired performance?

fionser commented 1 year ago

Yes, the HEXL is developed by an Intel team :) I will expect ARM CPU will have a slower performance than the latest Intel CPU.

hobbitlzy commented 1 year ago

OK, that is bad...

hobbitlzy commented 1 year ago

After moving the repository to a new machine with an Intel CPU that supports AVX512, I found that despite this improvement, the convolution process still runs slowly. The model is resnet50 and the setting is a local loop without any bandwidth limitation. I have included the full results below - can you confirm whether this output is normal?

Total time taken = 188059 milliseconds.
Total data sent = 645.472 MiB.
Number of rounds = 796
Total comm (sent+received) = (see SERVER OUTPUT)
------------------------------------------------------
Total time in Conv = 102.03 seconds.
Total time in MatMul = 0.224 seconds.
Total time in BatchNorm = 15.254 seconds.
Total time in Truncation = 40.699 seconds.
Total time in Relu = 4.086 seconds.
Total time in MaxPool = 7.574 seconds.
Total time in AvgPool = 0.517 seconds.
Total time in ArgMax = 2.442 seconds.
Total time in MatAdd = 0 seconds.
Total time in MatAddBroadCast = 0 seconds.
Total time in MulCir = 0 seconds.
Total time in ScalarMul = 0 seconds.
Total time in Sigmoid = 0 seconds.
Total time in Tanh = 0 seconds.
Total time in Sqrt = 0 seconds.
Total time in NormaliseL2 = 0 seconds.
------------------------------------------------------
Conv data sent = 167.762 MiB.
MatMul data sent = 1.19058 MiB.
BatchNorm data sent = 189.905 MiB.
Truncation data sent = 172.698 MiB.
Relu data sent = 68.7148 MiB.
Maxpool data sent = 44.5175 MiB.
Avgpool data sent = 0.644144 MiB.
ArgMax data sent = 0.0393038 MiB.
MatAdd data sent = 0 MiB.
MatAddBroadCast data sent = 0 MiB.
MulCir data sent = 0 MiB.
Sigmoid data sent = 0 MiB.
Tanh data sent = 0 MiB.
Sqrt data sent = 0 MiB.
NormaliseL2 data sent = 0 MiB.
fionser commented 1 year ago

It seems your program is slow. Here is my log. So make sure you have built SEAL using the HEXL library.

LAN-1core-client-output

Total time taken = 234781 milliseconds.
Total data sent = 532.123 MiB.
Number of rounds = 1404
Total comm (sent+received) = (see SERVER OUTPUT)
------------------------------------------------------
Total time in Conv = 72.391 seconds.
Total time in MatMul = 0.111 seconds.
Total time in BatchNorm = 7.078 seconds.
Total time in Truncation = 55.729 seconds.
Total time in Relu = 75.088 seconds.
Total time in MaxPool = 13.736 seconds.
Total time in AvgPool = 0.031 seconds.
Total time in ArgMax = 0.026 seconds.
Total time in MatAdd = 0 seconds.
Total time in MatAddBroadCast = 0 seconds.
Total time in MulCir = 0 seconds.
Total time in ScalarMul = 0 seconds.
Total time in Sigmoid = 0 seconds.
Total time in Tanh = 0 seconds.
Total time in Sqrt = 0 seconds.
Total time in NormaliseL2 = 0 seconds.
------------------------------------------------------
Conv data sent = 155.43 MiB.
MatMul data sent = 1.10317 MiB.
BatchNorm data sent = 186.266 MiB.
Truncation data sent = 54.2914 MiB.
Relu data sent = 114.311 MiB.
Maxpool data sent = 20.3994 MiB.
Avgpool data sent = 0.282445 MiB.
ArgMax data sent = 0.0393038 MiB.
MatAdd data sent = 0 MiB.
MatAddBroadCast data sent = 0 MiB.
MulCir data sent = 0 MiB.
Sigmoid data sent = 0 MiB.
Tanh data sent = 0 MiB.
Sqrt data sent = 0 MiB.
NormaliseL2 data sent = 0 MiB.
------------------------------------------------------

LAN-4core-client-output

Total time taken = 80718 milliseconds.
Total data sent = 819.365 MiB.
Number of rounds = 1388
Total comm (sent+received) = (see SERVER OUTPUT)
------------------------------------------------------
Total time in Conv = 25.532 seconds.
Total time in MatMul = 0.064 seconds.
Total time in BatchNorm = 3.851 seconds.
Total time in Truncation = 18.255 seconds.
Total time in Relu = 24.358 seconds.
Total time in MaxPool = 3.086 seconds.
Total time in AvgPool = 0.021 seconds.
Total time in ArgMax = 0.025 seconds.
Total time in MatAdd = 0 seconds.
Total time in MatAddBroadCast = 0 seconds.
Total time in MulCir = 0 seconds.
Total time in ScalarMul = 0 seconds.
Total time in Sigmoid = 0 seconds.
Total time in Tanh = 0 seconds.
Total time in Sqrt = 0 seconds.
Total time in NormaliseL2 = 0 seconds.
------------------------------------------------------
Conv data sent = 155.43 MiB.
MatMul data sent = 1.1032 MiB.
BatchNorm data sent = 186.264 MiB.
Truncation data sent = 172.698 MiB.
Relu data sent = 258.67 MiB.
Maxpool data sent = 44.5175 MiB.
Avgpool data sent = 0.644144 MiB.
ArgMax data sent = 0.0393038 MiB.
MatAdd data sent = 0 MiB.
MatAddBroadCast data sent = 0 MiB.
MulCir data sent = 0 MiB.
Sigmoid data sent = 0 MiB.
Tanh data sent = 0 MiB.
Sqrt data sent = 0 MiB.
NormaliseL2 data sent = 0 MiB.
------------------------------------------------------
hobbitlzy commented 1 year ago

Thanks for your quick reply. Based on the information provided, it seems that the SEAL may have been installed with HEXL enabled. The CMake log displays the message "SEAL_USE_INTEL_HEXL: ON" indicating that the SEAL library is using the Intel HEXL library. However, there is a warning present in the log, does it cause the problem? Now I cannot figure out a way to fix it.

-- Build type (CMAKE_BUILD_TYPE): Release
-- Microsoft SEAL debug mode: OFF
-- SEAL_USE_CXX17: ON
-- SEAL_BUILD_DEPS: OFF
-- SEAL_USE_MSGSL: OFF
-- SEAL_USE_ZLIB: OFF
-- SEAL_USE_ZSTD: ON
-- SEAL_USE_INTEL_HEXL: ON
-- Could NOT find CpuFeatures (missing: CpuFeatures_DIR)
CMake Warning at /home/tmp/Opencheetah/build/lib/cmake/hexl-1.2.2/HEXLConfig.cmake:41 (message):
  Could not find pre-installed CpuFeatures; using CpuFeatures packaged with
  HEXL
Call Stack (most recent call first):
  CMakeLists.txt:226 (find_package)

-- Intel HEXL found
-- BUILD_SHARED_LIBS: OFF
-- SEAL_THROW_ON_TRANSPARENT_CIPHERTEXT: ON
-- SEAL_USE_GAUSSIAN_NOISE: OFF
-- SEAL_DEFAULT_PRNG: Blake2xb
-- x86intrin.h - found
-- SEAL_USE_INTRIN: ON
-- SEAL_USE_MEMSET_S: OFF
-- SEAL_USE_EXPLICIT_BZERO: ON
-- SEAL_USE_EXPLICIT_MEMSET: OFF
-- SEAL_BUILD_SEAL_C: OFF
-- Intel HEXL does not have a pkg-config file
-- SEAL_BUILD_EXAMPLES: OFF
-- SEAL_BUILD_TESTS: OFF
-- SEAL_BUILD_BENCH: OFF
-- Configuring done
-- Generating done
-- Build files have been written to: /home/tmp/Opencheetah/build/deps/SEAL
[100%] Built target seal
fionser commented 1 year ago

Why your ReLU takes only small time ? It seems weird.

fionser commented 1 year ago

I re-run the program on a x86 machine (without AVX512) (cat /proc/cpuinfo | grep avx to see whether the AVX512 is enabled)

Here is the log on the client side (using scripts/run-client cheetah resnet50 command)

Using Eigen for Matmul
Doing BaseOT ...
using silent ot pack
using silent ot pack
using silent ot pack
using silent ot pack
After one-time setup, communication
Thread i = 0, total data sent till now = 830913
Thread i = 1, total data sent till now = 1726247
Thread i = 2, total data sent till now = 1726247
Thread i = 3, total data sent till now = 1726247
-----------Syncronizing-----------
secret_share_mod: 137438953472 bitlength: 37
backend: Ring-SilentOT-Cheetah
-----------Syncronized - now starting execution-----------
HomConv #1 called N=1, H=230, W=230, CI=3, FH=7, FW=7, CO=64, S=2, Padding VALID (0 0 0 0)
Time in sec for current conv = [0.807] sent [2.85732] MB
expect -4290393841 but got -4290393840
expect -5031087768 but got -5031087767
expect -511922748 but got -511922749
expect 244766175 but got 244766174
Convolution Output Matches
Maxpool 1 called N=1, H=56, W=56, C=64, ksizeH=3, ksizeW=3
Time in sec for current maxpool = 1.864
Truncate #1 on 200704 points by 12 bits
HomBN #1 on shape [64, 56, 56]
Time in sec for current BN = [0.085] sent [3.8101] MB
Relu #1 on 200704 points, truncate=1 by 12 bits
Time in sec for current relu = 0.205
HomConv #2 called N=1, H=56, W=56, CI=64, FH=1, FW=1, CO=256, S=1, Padding VALID (0 0 0 0)
Time in sec for current conv = [0.374] sent [3.81004] MB
expect 57409406 but got 57409407
expect -9739872 but got -9739871
expect 16252788 but got 16252787
expect 5138488 but got 5138487
Convolution Output Matches
HomConv #3 called N=1, H=56, W=56, CI=64, FH=1, FW=1, CO=64, S=1, Padding VALID (0 0 0 0)
Time in sec for current conv = [0.112] sent [3.81009] MB
expect 3448064 but got 3448065
expect -294430681 but got -294430680
expect -53807499 but got -53807500
Convolution Output Matches
Relu #3 on 200704 points, truncate=1 by 12 bits
Time in sec for current relu = 0.221
HomConv #4 called N=1, H=56, W=56, CI=64, FH=3, FW=3, CO=64, S=1, Padding SAME (1 1 1 1)
Time in sec for current conv = [0.408] sent [3.81015] MB
expect -12052429 but got -12052428
expect -27662290 but got -27662289
expect 15449137 but got 15449136
Convolution Output Matches
Relu #5 on 200704 points, truncate=1 by 12 bits
Time in sec for current relu = 0.218
HomConv #5 called N=1, H=56, W=56, CI=64, FH=1, FW=1, CO=256, S=1, Padding VALID (0 0 0 0)
Time in sec for current conv = [0.348] sent [3.81011] MB
expect -51426190 but got -51426189
expect -24289380 but got -24289379
expect 8822318 but got 8822317
expect 7164951 but got 7164950
Convolution Output Matches
Truncate #2 on 802816 points by 12 bits
HomBN #2 on shape [256, 56, 56]
Time in sec for current BN = [0.348] sent [15.2399] MB
Relu #7 on 802816 points, truncate=1 by 12 bits
Time in sec for current relu = 1.141
HomConv #6 called N=1, H=56, W=56, CI=256, FH=1, FW=1, CO=64, S=1, Padding VALID (0 0 0 0)
Time in sec for current conv = [0.359] sent [15.2399] MB
expect -11879929 but got -11879928
expect -103985557 but got -103985556
expect -28015820 but got -28015821
expect -17604984 but got -17604985
Convolution Output Matches
Relu #9 on 200704 points, truncate=1 by 12 bits
Time in sec for current relu = 0.209
HomConv #7 called N=1, H=56, W=56, CI=64, FH=3, FW=3, CO=64, S=1, Padding SAME (1 1 1 1)
Time in sec for current conv = [0.408] sent [3.81027] MB
expect 19324650 but got 19324651
expect -13122733 but got -13122732
expect -18829546 but got -18829547
expect 7479784 but got 7479783
Convolution Output Matches
Relu #11 on 200704 points, truncate=1 by 12 bits
Time in sec for current relu = 0.219
HomConv #8 called N=1, H=56, W=56, CI=64, FH=1, FW=1, CO=256, S=1, Padding VALID (0 0 0 0)
Time in sec for current conv = [0.348] sent [3.81006] MB
expect -5550545 but got -5550544
expect -14479383 but got -14479382
expect -853379 but got -853380
expect -3725294 but got -3725295
Convolution Output Matches
Truncate #3 on 802816 points by 12 bits
HomBN #3 on shape [256, 56, 56]
Time in sec for current BN = [0.32] sent [15.2402] MB
Relu #13 on 802816 points, truncate=1 by 12 bits
Time in sec for current relu = 1.41
HomConv #9 called N=1, H=56, W=56, CI=256, FH=1, FW=1, CO=64, S=1, Padding VALID (0 0 0 0)
Time in sec for current conv = [0.343] sent [15.2399] MB
expect 943516 but got 943517
expect 33271774 but got 33271775
expect -4693996 but got -4693997
Convolution Output Matches
Relu #15 on 200704 points, truncate=1 by 12 bits
Time in sec for current relu = 0.465
HomConv #10 called N=1, H=56, W=56, CI=64, FH=3, FW=3, CO=64, S=1, Padding SAME (1 1 1 1)
Time in sec for current conv = [0.411] sent [3.81006] MB
expect -12703974 but got -12703973
expect -5258703 but got -5258702
expect 19718977 but got 19718976
expect 55292447 but got 55292446
Convolution Output Matches
Relu #17 on 200704 points, truncate=1 by 12 bits
Time in sec for current relu = 0.234
HomConv #11 called N=1, H=56, W=56, CI=64, FH=1, FW=1, CO=256, S=1, Padding VALID (0 0 0 0)
Time in sec for current conv = [0.353] sent [3.81014] MB
expect 22815158 but got 22815159
expect 10285235 but got 10285236
expect -9902726 but got -9902727
expect -956386 but got -956387
Convolution Output Matches
Truncate #4 on 802816 points by 12 bits
HomBN #4 on shape [256, 56, 56]
Time in sec for current BN = [0.344] sent [15.2401] MB
Relu #19 on 802816 points, truncate=1 by 12 bits
Time in sec for current relu = 1.147
HomConv #12 called N=1, H=56, W=56, CI=256, FH=1, FW=1, CO=512, S=2, Padding VALID (0 0 0 0)
Time in sec for current conv = [2.377] sent [3.09569] MB
expect -4725960 but got -4725959
expect 151049 but got 151050
expect 6608136 but got 6608135
Convolution Output Matches
HomConv #13 called N=1, H=56, W=56, CI=256, FH=1, FW=1, CO=128, S=1, Padding VALID (0 0 0 0)
Time in sec for current conv = [0.558] sent [15.2402] MB
expect 15217623 but got 15217624
expect -20843562 but got -20843561
Convolution Output Matches
Relu #21 on 401408 points, truncate=1 by 12 bits
Time in sec for current relu = 0.442
HomConv #14 called N=1, H=58, W=58, CI=128, FH=3, FW=3, CO=128, S=2, Padding VALID (0 0 0 0)
Time in sec for current conv = [1.469] sent [7.61977] MB
expect 2597765 but got 2597766
expect -1474728 but got -1474727
Convolution Output Matches
Relu #23 on 100352 points, truncate=1 by 12 bits
Time in sec for current relu = 0.448
HomConv #15 called N=1, H=28, W=28, CI=128, FH=1, FW=1, CO=512, S=1, Padding VALID (0 0 0 0)
Time in sec for current conv = [1.257] sent [1.54788] MB
expect -3100565 but got -3100564
expect -278137 but got -278136
expect -1516490 but got -1516491
expect 9846200 but got 9846199
Convolution Output Matches
Truncate #5 on 401408 points by 12 bits
HomBN #1 via element-wise mult on 401408 points
ElemWiseSecretVectorMult Output Matches
Relu #25 on 401408 points, truncate=1 by 12 bits
Time in sec for current relu = 1.045
HomConv #16 called N=1, H=28, W=28, CI=512, FH=1, FW=1, CO=128, S=1, Padding VALID (0 0 0 0)
Time in sec for current conv = [1.169] sent [6.13192] MB
expect 47273880 but got 47273881
expect 23085395 but got 23085396
Convolution Output Matches
Relu #27 on 100352 points, truncate=1 by 12 bits
Time in sec for current relu = 0.102
HomConv #17 called N=1, H=28, W=28, CI=128, FH=3, FW=3, CO=128, S=1, Padding SAME (1 1 1 1)
Time in sec for current conv = [0.402] sent [1.90508] MB
expect 8880339 but got 8880340
expect -27604470 but got -27604469
Convolution Output Matches
Truncate #6 on 100352 points by 12 bits
HomBN #2 via element-wise mult on 100352 points
ElemWiseSecretVectorMult Output Matches
Relu #29 on 100352 points, truncate=1 by 12 bits
Time in sec for current relu = 0.105
HomConv #18 called N=1, H=28, W=28, CI=128, FH=1, FW=1, CO=512, S=1, Padding VALID (0 0 0 0)
Time in sec for current conv = [1.256] sent [1.54786] MB
expect 6773400 but got 6773401
expect 6338867 but got 6338868
expect -7099887 but got -7099888
expect 9091081 but got 9091080
Convolution Output Matches
Truncate #7 on 401408 points by 12 bits
HomBN #3 via element-wise mult on 401408 points
ElemWiseSecretVectorMult Output Matches
Relu #31 on 401408 points, truncate=1 by 12 bits
Time in sec for current relu = 0.436
HomConv #19 called N=1, H=28, W=28, CI=512, FH=1, FW=1, CO=128, S=1, Padding VALID (0 0 0 0)
Time in sec for current conv = [1.172] sent [6.1315] MB
expect 7606951 but got 7606952
expect -27852961 but got -27852960
expect -53574 but got -53575
expect 3398425 but got 3398424
Convolution Output Matches
Relu #33 on 100352 points, truncate=1 by 12 bits
Time in sec for current relu = 0.126
HomConv #20 called N=1, H=28, W=28, CI=128, FH=3, FW=3, CO=128, S=1, Padding SAME (1 1 1 1)
Time in sec for current conv = [0.401] sent [1.90514] MB
expect -4528510 but got -4528509
expect -25287874 but got -25287873
expect 10350653 but got 10350652
Convolution Output Matches
Relu #35 on 100352 points, truncate=1 by 12 bits
Time in sec for current relu = 0.105
HomConv #21 called N=1, H=28, W=28, CI=128, FH=1, FW=1, CO=512, S=1, Padding VALID (0 0 0 0)
Time in sec for current conv = [1.26] sent [1.54754] MB
expect -1960600 but got -1960599
expect 162465 but got 162466
expect -8777247 but got -8777248
Convolution Output Matches
Truncate #8 on 401408 points by 12 bits
HomBN #4 via element-wise mult on 401408 points
ElemWiseSecretVectorMult Output Matches
Relu #37 on 401408 points, truncate=1 by 12 bits
Time in sec for current relu = 0.441
HomConv #22 called N=1, H=28, W=28, CI=512, FH=1, FW=1, CO=128, S=1, Padding VALID (0 0 0 0)
Time in sec for current conv = [1.17] sent [6.13171] MB
expect -34522541 but got -34522540
expect 6573825 but got 6573826
Convolution Output Matches
Relu #39 on 100352 points, truncate=1 by 12 bits
Time in sec for current relu = 0.46
HomConv #23 called N=1, H=28, W=28, CI=128, FH=3, FW=3, CO=128, S=1, Padding SAME (1 1 1 1)
Time in sec for current conv = [0.427] sent [1.90511] MB
expect -20522876 but got -20522875
expect 12199753 but got 12199754
Convolution Output Matches
Relu #41 on 100352 points, truncate=1 by 12 bits
Time in sec for current relu = 0.129
HomConv #24 called N=1, H=28, W=28, CI=128, FH=1, FW=1, CO=512, S=1, Padding VALID (0 0 0 0)
Time in sec for current conv = [1.263] sent [1.54797] MB
expect -3642934 but got -3642933
expect 5015489 but got 5015490
expect -1368909 but got -1368910
expect -3797912 but got -3797913
Convolution Output Matches
Truncate #9 on 401408 points by 12 bits
HomBN #5 via element-wise mult on 401408 points
ElemWiseSecretVectorMult Output Matches
Relu #43 on 401408 points, truncate=1 by 12 bits
Time in sec for current relu = 0.671
HomConv #25 called N=1, H=28, W=28, CI=512, FH=1, FW=1, CO=1024, S=2, Padding VALID (0 0 0 0)
Time in sec for current conv = [2.472] sent [1.54778] MB
expect -2854091 but got -2854090
expect -5968997 but got -5968996
expect 194502 but got 194501
Convolution Output Matches
HomConv #26 called N=1, H=28, W=28, CI=512, FH=1, FW=1, CO=256, S=1, Padding VALID (0 0 0 0)
Time in sec for current conv = [2.247] sent [6.13159] MB
expect 8542039 but got 8542040
expect -6078174 but got -6078173
expect 21506685 but got 21506684
Convolution Output Matches
Relu #45 on 200704 points, truncate=1 by 12 bits
Time in sec for current relu = 0.214
HomConv #27 called N=1, H=30, W=30, CI=256, FH=3, FW=3, CO=256, S=2, Padding VALID (0 0 0 0)
Time in sec for current conv = [1.477] sent [3.81017] MB
expect -5936975 but got -5936974
expect -13913056 but got -13913055
Convolution Output Matches
Relu #47 on 50176 points, truncate=1 by 12 bits
Time in sec for current relu = 0.057
HomConv #28 called N=1, H=14, W=14, CI=256, FH=1, FW=1, CO=1024, S=1, Padding VALID (0 0 0 0)
Time in sec for current conv = [1.422] sent [0.773912] MB
expect 561634 but got 561635
expect -1536377 but got -1536376
Convolution Output Matches
Truncate #10 on 200704 points by 12 bits
HomBN #6 via element-wise mult on 200704 points
ElemWiseSecretVectorMult Output Matches
Relu #49 on 200704 points, truncate=1 by 12 bits
Time in sec for current relu = 0.463
HomConv #29 called N=1, H=14, W=14, CI=1024, FH=1, FW=1, CO=256, S=1, Padding VALID (0 0 0 0)
Time in sec for current conv = [1.194] sent [3.09587] MB
expect -64502765 but got -64502764
expect -441100 but got -441099
Convolution Output Matches
Relu #51 on 50176 points, truncate=1 by 12 bits
Time in sec for current relu = 0.048
HomConv #30 called N=1, H=14, W=14, CI=256, FH=3, FW=3, CO=256, S=1, Padding SAME (1 1 1 1)
Time in sec for current conv = [0.433] sent [0.952634] MB
expect -15416465 but got -15416464
expect -32153706 but got -32153705
Convolution Output Matches
Relu #53 on 50176 points, truncate=1 by 12 bits
Time in sec for current relu = 0.048
HomConv #31 called N=1, H=14, W=14, CI=256, FH=1, FW=1, CO=1024, S=1, Padding VALID (0 0 0 0)
Time in sec for current conv = [1.375] sent [0.774064] MB
expect 2041729 but got 2041730
expect -3319148 but got -3319147
Convolution Output Matches
Truncate #11 on 200704 points by 12 bits
HomBN #7 via element-wise mult on 200704 points
ElemWiseSecretVectorMult Output Matches
Relu #55 on 200704 points, truncate=1 by 12 bits
Time in sec for current relu = 0.206
HomConv #32 called N=1, H=14, W=14, CI=1024, FH=1, FW=1, CO=256, S=1, Padding VALID (0 0 0 0)
Time in sec for current conv = [1.194] sent [3.09566] MB
expect -10782788 but got -10782787
expect 6820609 but got 6820610
Convolution Output Matches
Relu #57 on 50176 points, truncate=1 by 12 bits
Time in sec for current relu = 0.048
HomConv #33 called N=1, H=14, W=14, CI=256, FH=3, FW=3, CO=256, S=1, Padding SAME (1 1 1 1)
Time in sec for current conv = [0.441] sent [0.952512] MB
expect 5701299 but got 5701300
expect -22038103 but got -22038102
Convolution Output Matches
Relu #59 on 50176 points, truncate=1 by 12 bits
Time in sec for current relu = 0.048
HomConv #34 called N=1, H=14, W=14, CI=256, FH=1, FW=1, CO=1024, S=1, Padding VALID (0 0 0 0)
Time in sec for current conv = [1.378] sent [0.773971] MB
expect -3501149 but got -3501148
expect 518803 but got 518804
Convolution Output Matches
Truncate #12 on 200704 points by 12 bits
HomBN #8 via element-wise mult on 200704 points
ElemWiseSecretVectorMult Output Matches
Relu #61 on 200704 points, truncate=1 by 12 bits
Time in sec for current relu = 0.205
HomConv #35 called N=1, H=14, W=14, CI=1024, FH=1, FW=1, CO=256, S=1, Padding VALID (0 0 0 0)
Time in sec for current conv = [1.193] sent [3.09567] MB
expect -22456621 but got -22456620
expect -991657 but got -991656
Convolution Output Matches
Relu #63 on 50176 points, truncate=1 by 12 bits
Time in sec for current relu = 0.057
HomConv #36 called N=1, H=14, W=14, CI=256, FH=3, FW=3, CO=256, S=1, Padding SAME (1 1 1 1)
Time in sec for current conv = [0.433] sent [0.952647] MB
expect -17655424 but got -17655423
expect -33496073 but got -33496072
Convolution Output Matches
Relu #65 on 50176 points, truncate=1 by 12 bits
Time in sec for current relu = 0.049
HomConv #37 called N=1, H=14, W=14, CI=256, FH=1, FW=1, CO=1024, S=1, Padding VALID (0 0 0 0)
Time in sec for current conv = [1.382] sent [0.773881] MB
expect -524669 but got -524668
expect -769614 but got -769613
Convolution Output Matches
Truncate #13 on 200704 points by 12 bits
HomBN #9 via element-wise mult on 200704 points
ElemWiseSecretVectorMult Output Matches
Relu #67 on 200704 points, truncate=1 by 12 bits
Time in sec for current relu = 0.459
HomConv #38 called N=1, H=14, W=14, CI=1024, FH=1, FW=1, CO=256, S=1, Padding VALID (0 0 0 0)
Time in sec for current conv = [1.193] sent [3.09537] MB
expect 63619529 but got 63619530
expect -25549968 but got -25549967
Convolution Output Matches
Relu #69 on 50176 points, truncate=1 by 12 bits
Time in sec for current relu = 0.049
HomConv #39 called N=1, H=14, W=14, CI=256, FH=3, FW=3, CO=256, S=1, Padding SAME (1 1 1 1)
Time in sec for current conv = [0.43] sent [0.952485] MB
expect 2533347 but got 2533348
expect -14582123 but got -14582122
Convolution Output Matches
Relu #71 on 50176 points, truncate=1 by 12 bits
Time in sec for current relu = 0.048
HomConv #40 called N=1, H=14, W=14, CI=256, FH=1, FW=1, CO=1024, S=1, Padding VALID (0 0 0 0)
Time in sec for current conv = [1.375] sent [0.773922] MB
expect -2237704 but got -2237703
expect -2127227 but got -2127226
Convolution Output Matches
Truncate #14 on 200704 points by 12 bits
HomBN #10 via element-wise mult on 200704 points
ElemWiseSecretVectorMult Output Matches
Relu #73 on 200704 points, truncate=1 by 12 bits
Time in sec for current relu = 0.474
HomConv #41 called N=1, H=14, W=14, CI=1024, FH=1, FW=1, CO=256, S=1, Padding VALID (0 0 0 0)
Time in sec for current conv = [1.192] sent [3.09576] MB
expect -13715 but got -13714
expect 469763 but got 469764
expect -6291053 but got -6291054
Convolution Output Matches
Relu #75 on 50176 points, truncate=1 by 12 bits
Time in sec for current relu = 0.049
HomConv #42 called N=1, H=14, W=14, CI=256, FH=3, FW=3, CO=256, S=1, Padding SAME (1 1 1 1)
Time in sec for current conv = [0.431] sent [0.952525] MB
expect -6356943 but got -6356942
expect -7326302 but got -7326301
Convolution Output Matches
Relu #77 on 50176 points, truncate=1 by 12 bits
Time in sec for current relu = 0.049
HomConv #43 called N=1, H=14, W=14, CI=256, FH=1, FW=1, CO=1024, S=1, Padding VALID (0 0 0 0)
Time in sec for current conv = [1.375] sent [0.773769] MB
expect -6268075 but got -6268074
expect -5925456 but got -5925455
Convolution Output Matches
Truncate #15 on 200704 points by 12 bits
HomBN #11 via element-wise mult on 200704 points
ElemWiseSecretVectorMult Output Matches
Relu #79 on 200704 points, truncate=1 by 12 bits
Time in sec for current relu = 0.207
HomConv #44 called N=1, H=14, W=14, CI=1024, FH=1, FW=1, CO=2048, S=2, Padding VALID (0 0 0 0)
Time in sec for current conv = [2.74] sent [0.773945] MB
expect 939371 but got 939372
expect -235329 but got -235328
Convolution Output Matches
HomConv #45 called N=1, H=14, W=14, CI=1024, FH=1, FW=1, CO=512, S=1, Padding VALID (0 0 0 0)
Time in sec for current conv = [2.306] sent [3.09553] MB
expect -15964781 but got -15964780
expect 7051263 but got 7051264
Convolution Output Matches
Relu #81 on 100352 points, truncate=1 by 12 bits
Time in sec for current relu = 0.099
HomConv #46 called N=1, H=16, W=16, CI=512, FH=3, FW=3, CO=512, S=2, Padding VALID (0 0 0 0)
Time in sec for current conv = [1.535] sent [1.90488] MB
expect 22808 but got 22809
expect -3319866 but got -3319865
Convolution Output Matches
Relu #83 on 25088 points, truncate=1 by 12 bits
Time in sec for current relu = 0.025
HomConv #47 called N=1, H=7, W=7, CI=512, FH=1, FW=1, CO=2048, S=1, Padding VALID (0 0 0 0)
Time in sec for current conv = [1.735] sent [0.416785] MB
expect -3260123 but got -3260122
expect -1728561 but got -1728560
Convolution Output Matches
Truncate #16 on 100352 points by 12 bits
HomBN #12 via element-wise mult on 100352 points
ElemWiseSecretVectorMult Output Matches
Relu #85 on 100352 points, truncate=1 by 12 bits
Time in sec for current relu = 0.099
HomConv #48 called N=1, H=7, W=7, CI=2048, FH=1, FW=1, CO=512, S=1, Padding VALID (0 0 0 0)
Time in sec for current conv = [1.217] sent [1.48826] MB
expect -8492346 but got -8492345
expect 6039291 but got 6039292
Convolution Output Matches
Relu #87 on 25088 points, truncate=1 by 12 bits
Time in sec for current relu = 0.025
HomConv #49 called N=1, H=7, W=7, CI=512, FH=3, FW=3, CO=512, S=1, Padding SAME (1 1 1 1)
Time in sec for current conv = [0.638] sent [0.654822] MB
expect -30938076 but got -30938075
expect -25609005 but got -25609004
Convolution Output Matches
Relu #89 on 25088 points, truncate=1 by 12 bits
Time in sec for current relu = 0.028
HomConv #50 called N=1, H=7, W=7, CI=512, FH=1, FW=1, CO=2048, S=1, Padding VALID (0 0 0 0)
Time in sec for current conv = [1.703] sent [0.416807] MB
expect -1367648 but got -1367647
expect -3361 but got -3360
Convolution Output Matches
Truncate #17 on 100352 points by 12 bits
HomBN #13 via element-wise mult on 100352 points
ElemWiseSecretVectorMult Output Matches
Relu #91 on 100352 points, truncate=1 by 12 bits
Time in sec for current relu = 0.512
HomConv #51 called N=1, H=7, W=7, CI=2048, FH=1, FW=1, CO=512, S=1, Padding VALID (0 0 0 0)
Time in sec for current conv = [1.211] sent [1.48839] MB
expect -14493182 but got -14493181
expect -12025638 but got -12025637
Convolution Output Matches
Relu #93 on 25088 points, truncate=1 by 12 bits
Time in sec for current relu = 0.025
HomConv #52 called N=1, H=7, W=7, CI=512, FH=3, FW=3, CO=512, S=1, Padding SAME (1 1 1 1)
Time in sec for current conv = [0.646] sent [0.65486] MB
expect -23437118 but got -23437117
expect -24417546 but got -24417545
Convolution Output Matches
Relu #95 on 25088 points, truncate=1 by 12 bits
Time in sec for current relu = 0.025
HomConv #53 called N=1, H=7, W=7, CI=512, FH=1, FW=1, CO=2048, S=1, Padding VALID (0 0 0 0)
Time in sec for current conv = [1.711] sent [0.416752] MB
expect -6698772 but got -6698771
expect -8264394 but got -8264393
Convolution Output Matches
Truncate #18 on 100352 points by 12 bits
HomBN #14 via element-wise mult on 100352 points
ElemWiseSecretVectorMult Output Matches
Relu #97 on 100352 points, truncate=1 by 12 bits
Time in sec for current relu = 0.1
AvgPool 1 called N=1, H=1, W=1, C=2048, ksizeH=7, ksizeW=7
Time in sec for current avgpool = 0.013
Time in sec for current matmul = 0.099
MatMul Output Matches
ArgMax 1 called, s1=1, s2=1001
Thread i = 0, total data sent till now = 426848967
Thread i = 1, total data sent till now = 201876823
Thread i = 2, total data sent till now = 51414076
Thread i = 3, total data sent till now = 201876823
------------------------------------------------------
------------------------------------------------------
------------------------------------------------------
Total time taken = 89852 milliseconds.
Total data sent = 835.425 MiB.
Number of rounds = 1388
Total comm (sent+received) = (see SERVER OUTPUT)
------------------------------------------------------
Total time in Conv = 56.531 seconds.
Total time in MatMul = 0.099 seconds.
Total time in BatchNorm = 3.149 seconds.
Total time in Truncation = 11.039 seconds.
Total time in Relu = 13.695 seconds.
Total time in MaxPool = 1.864 seconds.
Total time in AvgPool = 0.013 seconds.
Total time in ArgMax = 0.006 seconds.
Total time in MatAdd = 0 seconds.
Total time in MatAddBroadCast = 0 seconds.
Total time in MulCir = 0 seconds.
Total time in ScalarMul = 0 seconds.
Total time in Sigmoid = 0 seconds.
Total time in Tanh = 0 seconds.
Total time in Sqrt = 0 seconds.
Total time in NormaliseL2 = 0 seconds.
------------------------------------------------------
Conv data sent = 167.761 MiB.
MatMul data sent = 1.19066 MiB.
BatchNorm data sent = 189.906 MiB.
Truncation data sent = 172.698 MiB.
Relu data sent = 258.67 MiB.
Maxpool data sent = 44.5175 MiB.
Avgpool data sent = 0.644144 MiB.
ArgMax data sent = 0.0393038 MiB.
MatAdd data sent = 0 MiB.
MatAddBroadCast data sent = 0 MiB.
MulCir data sent = 0 MiB.
Sigmoid data sent = 0 MiB.
Tanh data sent = 0 MiB.
Sqrt data sent = 0 MiB.
NormaliseL2 data sent = 0 MiB.
------------------------------------------------------
top-10 values from ResNet50
[13.0611958,11.7018878,10.7381859,10.4303904,9.8505569,9.2953665,8.8901457,8.2409878,7.6950630,7.4976710,]
predicted label=249
hobbitlzy commented 1 year ago

Why your ReLU takes only small time ? It seems weird.

Sorry for causing the unnecessary question. This is because I am modifying the implementation of the ReLU.

I checked the cat /proc/cpuinfo | grep avx and it is indeed enabled. I devise a walkaround to verify its usage. I close the HEXL when compiling SEAl at this line. This resulted in a significant increase in the convolutional latency. I guess this is sufficient to show the previous log has the AVX enabled. The slow speed may be due to the CPU itself.

Thanks for your patient reply.

fionser commented 1 year ago

BTW, AVX512-IFMA is faster than AVX512-DQ. (The report numbers in our paper are computed with IFMA instruction).