EnigmaSong / nethist

R package for Network histogram
https://enigmasong.github.io/nethist/
Other
2 stars 1 forks source link

Timing performance nethist() in some Linux system #8

Open EnigmaSong opened 1 year ago

EnigmaSong commented 1 year ago

While checking the timing performance of nethist() in various machine, and I realized it is very slow on two (old) linux machines. With #7, It takes less than 9 seconds in average using the following example code on MacBook Air (M1, 2020):

library(nethist)

set.seed(42)
nethist(polblog. h=72, verbose = TRUE) #For single run

#bench::mark({set.seed(42); nethist(polblog. h=72, verbose = TRUE)}, iterations=10) #For 10 runs

Using only one core, it takes more than 8 minutes and 6 minutes (median) on fidis and lab linux machine, respectively (see the table below). However, it takes less than 20 seconds on helvetios. On the other hand, the median of 10 runs using MATLAB code is around 130 seconds and 80 seconds in fidis and helvetios, respectively.

Machine name CPU OS BLAS R.ver nethist.ver min median itr/sec mem_alloc gc/sec n_itr n_gc total
helvetios (scitas) Intel(R) Xeon(R) Gold 5115 CPU @ 2.40GHz Red Hat Enterprise Linux 8.5 (Ootpa) openblas/0.3.10 4.0.2 0.2.0 19.9s 20.1s 0.0498 NA[^1] 0.234 10 47 3.35m
fidis (scitas) Intel(R) Xeon(R) CPU E5-2690 v4 @ 2.60GHz Red Hat Enterprise Linux Server 7.6 (Maipo) openblas/0.3.10 4.0.2 0.2.0 8.09m 8.1m 0.00206 NA[^1] 0.00967 10 47 1.35h
Lab Linux server (2014?)[^1] Intel(R) Xeon(R) CPU E5-2650 v3 @ 2.30GHz Ubuntu 18.04.6 LTS libblas.so.3.7.1 4.2.1 0.2.0 6.43m 6.44m 0.00259 199MB 0.0122 ? ? ?
Lab Linux server (2014?) Intel(R) Xeon(R) CPU E5-2650 v3 @ 2.30GHz Ubuntu 18.04.6 LTS libopenblas_haswellp-r0.3.21 4.2.2 0.2.0 6.42m 6.71m 0.00249 199MB 0.0117 10 47 1.11h

It would be related to some configuration of Armadillo/RcppArmadillo/R on Linux. On lab's linux machine, I used a variant of graphest_fastgreedy.cpp to measure timing of each step when mm=0 and m = 0, that is, very first iteration of the loop in graphest_fastgreedy(). Here is a rough profiling result from lab's linux machine. We compute Exp. total time = Time * (Num. iteration of level-k loops). Based on the profile result, all the steps using armadillo operations takes much longer than I expected.

Running time = 375.4 sec

Loop level Num. iter.
0 1
1 141
2 2820000
3 4699518
3 $(a\neq b)$ 4422365
Step Loop Level Time/ (ms) Exp. Total time (sec)
Define variables 0 2.9134 0.0029
Initialization 0 5.2368 0.0052
Gen. iVecs 1 2.1588 0.3044
Update Trials 2 0.0030 8.4600
Prep. Label swap 3 0.0147 69.0829
OldTheta 3 $(a\neq b)$ 0.0073 32.2833
Update Acounts 3 $(a\neq b)$ 0.0084 37.1479
Update Theta 3 $(a\neq b)$ 0.0007 3.0957
Update LL 3 $(a\neq b)$ 0.0570 252.0748

$\sum \text{Exp. total time} = 402.5 \text{sec}$

To do

[^1]: R in fidis and helvetios seems not compiled with memory profiling, thus bench::mark() does not return memory profiling info. [^2]: Message from print(summary(res)) provided the part of the results on the log file.

EnigmaSong commented 1 year ago

Using ldd command, I checked RcppArmadillo.so on each machine.

Lab linux machine

> ldd ~/R/x86_64-pc-linux-gnu-library/4.2/RcppArmadillo/libs/RcppArmadillo.so 
linux-vdso.so.1 (0x00007ffcacf1d000)
liblapack.so.3 => /usr/lib/x86_64-linux-gnu/liblapack.so.3 (0x00007fa970b89000)
libopenblas.so.0 => /usr/lib/x86_64-linux-gnu/libopenblas.so.0 (0x00007fa96fc12000)
libR.so => /usr/lib/R/lib/libR.so (0x00007fa96f54e000)
libstdc++.so.6 => /usr/lib/x86_64-linux-gnu/libstdc++.so.6 (0x00007fa96f1c5000)
libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x00007fa96ee27000)
libgomp.so.1 => /usr/lib/x86_64-linux-gnu/libgomp.so.1 (0x00007fa96ebf8000)
libgcc_s.so.1 => /lib/x86_64-linux-gnu/libgcc_s.so.1 (0x00007fa96e9e0000)
libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007fa96e5ef000)
libgfortran.so.4 => /usr/lib/x86_64-linux-gnu/libgfortran.so.4 (0x00007fa96e210000)
libpthread.so.0 => /lib/x86_64-linux-gnu/libpthread.so.0 (0x00007fa96dff1000)
/lib64/ld-linux-x86-64.so.2 (0x00007fa97164d000)
libreadline.so.7 => /lib/x86_64-linux-gnu/libreadline.so.7 (0x00007fa96dda8000)
libpcre2-8.so.0 => /usr/lib/x86_64-linux-gnu/libpcre2-8.so.0 (0x00007fa96db26000)
liblzma.so.5 => /lib/x86_64-linux-gnu/liblzma.so.5 (0x00007fa96d900000)
libbz2.so.1.0 => /lib/x86_64-linux-gnu/libbz2.so.1.0 (0x00007fa96d6f0000)
libz.so.1 => /lib/x86_64-linux-gnu/libz.so.1 (0x00007fa96d4d3000)
librt.so.1 => /lib/x86_64-linux-gnu/librt.so.1 (0x00007fa96d2cb000)
libdl.so.2 => /lib/x86_64-linux-gnu/libdl.so.2 (0x00007fa96d0c7000)
libicuuc.so.60 => /usr/lib/x86_64-linux-gnu/libicuuc.so.60 (0x00007fa96cd0f000)
libicui18n.so.60 => /usr/lib/x86_64-linux-gnu/libicui18n.so.60 (0x00007fa96c86e000)
libquadmath.so.0 => /usr/lib/x86_64-linux-gnu/libquadmath.so.0 (0x00007fa96c62e000)
libtinfo.so.5 => /lib/x86_64-linux-gnu/libtinfo.so.5 (0x00007fa96c404000)
libicudata.so.60 => /usr/lib/x86_64-linux-gnu/libicudata.so.60 (0x00007fa96a85b000)

Fidis

> ldd R/lib/RcppArmadillo/libs/RcppArmadillo.so 
linux-vdso.so.1 =>  (0x00007ffce3bda000)
libRlapack.so => /ssoft/spack/arvine/v1/opt/spack/linux-rhel7-broadwell/gcc-8.4.0/r-4.0.2-rr54swjmivnucqn53qlykyajbssspyzt/rlib/R/lib/libRlapack.so (0x00002ab56adb7000)
libRblas.so => /ssoft/spack/arvine/v1/opt/spack/linux-rhel7-broadwell/gcc-8.4.0/r-4.0.2-rr54swjmivnucqn53qlykyajbssspyzt/rlib/R/lib/libRblas.so (0x00002ab56b1a0000)
libgfortran.so.5 => /ssoft/spack/arvine/v1/opt/spack/linux-rhel7-haswell/gcc-4.8.5/gcc-8.4.0-fgpbrrd26pxv56imea5tnqj67vxh3a64/lib64/libgfortran.so.5 (0x00002ab56b3a5000)
libquadmath.so.0 => /ssoft/spack/arvine/v1/opt/spack/linux-rhel7-haswell/gcc-4.8.5/gcc-8.4.0-fgpbrrd26pxv56imea5tnqj67vxh3a64/lib64/libquadmath.so.0 (0x00002ab56b817000)
libR.so => /ssoft/spack/arvine/v1/opt/spack/linux-rhel7-broadwell/gcc-8.4.0/r-4.0.2-rr54swjmivnucqn53qlykyajbssspyzt/rlib/R/lib/libR.so (0x00002ab56ba57000)
libstdc++.so.6 => /ssoft/spack/arvine/v1/opt/spack/linux-rhel7-haswell/gcc-4.8.5/gcc-8.4.0-fgpbrrd26pxv56imea5tnqj67vxh3a64/lib64/libstdc++.so.6 (0x00002ab56c0e9000)
libm.so.6 => /lib64/libm.so.6 (0x00002ab56c46d000)
libgomp.so.1 => /ssoft/spack/arvine/v1/opt/spack/linux-rhel7-haswell/gcc-4.8.5/gcc-8.4.0-fgpbrrd26pxv56imea5tnqj67vxh3a64/lib64/libgomp.so.1 (0x00002ab56c76f000)
libgcc_s.so.1 => /ssoft/spack/arvine/v1/opt/spack/linux-rhel7-haswell/gcc-4.8.5/gcc-8.4.0-fgpbrrd26pxv56imea5tnqj67vxh3a64/lib64/libgcc_s.so.1 (0x00002ab56c99d000)
libpthread.so.0 => /lib64/libpthread.so.0 (0x00002ab56cbb5000)
libc.so.6 => /lib64/libc.so.6 (0x00002ab56cdd1000)
libopenblas.so.0 => /ssoft/spack/arvine/v1/opt/spack/linux-rhel7-broadwell/gcc-8.4.0/openblas-0.3.10-by4yhm4acpz4ytmttv6y3hjvoda5qulz/lib/libopenblas.so.0 (0x00002ab56d19e000)
libz.so.1 => /ssoft/spack/arvine/v1/opt/spack/linux-rhel7-haswell/gcc-4.8.5/zlib-1.2.11-zolwez4onex3iueybyju2zi55njtdg5d/lib/libz.so.1 (0x00002ab56df20000)
libintl.so.8 => /ssoft/spack/arvine/v1/opt/spack/linux-rhel7-broadwell/gcc-8.4.0/gettext-0.20.2-zsw7ypm6nwsd35emoejeqeqd24lkhtxl/lib/libintl.so.8 (0x00002ab56e137000)
libreadline.so.8 => /ssoft/spack/arvine/v1/opt/spack/linux-rhel7-broadwell/gcc-8.4.0/readline-8.0-dz4azvauq6vvqbrsefly22tuc3tkvamk/lib/libreadline.so.8 (0x00002ab56e342000)
libpcre2-8.so.0 => /ssoft/spack/arvine/v1/opt/spack/linux-rhel7-broadwell/gcc-8.4.0/pcre2-10.35-gatbldx3lflurv4nmrbt77xjeqpnat5u/lib/libpcre2-8.so.0 (0x00002ab56e58f000)
liblzma.so.5 => /ssoft/spack/arvine/v1/opt/spack/linux-rhel7-broadwell/gcc-8.4.0/xz-5.2.5-wohx2lptsznx3ncrffvc2aqzvk773ppl/lib/liblzma.so.5 (0x00002ab56e7ea000)
libbz2.so.1.0 => /ssoft/spack/arvine/v1/opt/spack/linux-rhel7-broadwell/gcc-8.4.0/bzip2-1.0.8-kkafkr2l5cgvnvv65llzx6wtjio6duyf/lib/libbz2.so.1.0 (0x00002ab56ea10000)
librt.so.1 => /lib64/librt.so.1 (0x00002ab56ec21000)
libdl.so.2 => /lib64/libdl.so.2 (0x00002ab56ee29000)
libiconv.so.2 => /ssoft/spack/arvine/v1/opt/spack/linux-rhel7-broadwell/gcc-8.4.0/libiconv-1.16-rfbb6qkjqz7myfzyaoq73lnn3c5uricd/lib/libiconv.so.2 (0x00002ab56f02d000)
libicuuc.so.65 => /ssoft/spack/arvine/v1/opt/spack/linux-rhel7-broadwell/gcc-8.4.0/icu4c-65.1-fwwc3dj3bt2mm6dxkfxx4khyjjjysbqk/lib/libicuuc.so.65 (0x00002ab56f329000)
libicui18n.so.65 => /ssoft/spack/arvine/v1/opt/spack/linux-rhel7-broadwell/gcc-8.4.0/icu4c-65.1-fwwc3dj3bt2mm6dxkfxx4khyjjjysbqk/lib/libicui18n.so.65 (0x00002ab56f764000)
/lib64/ld-linux-x86-64.so.2 (0x00002ab56a96b000)
libncursesw.so.6 => /ssoft/spack/arvine/v1/opt/spack/linux-rhel7-broadwell/gcc-8.4.0/ncurses-6.2-fso3hu2wf3op77fmj2qxpfpgidmvadsg/lib/libncursesw.so.6 (0x00002ab56fcf0000)
libtinfo.so.6 => /ssoft/spack/arvine/v1/opt/spack/linux-rhel7-broadwell/gcc-8.4.0/ncurses-6.2-fso3hu2wf3op77fmj2qxpfpgidmvadsg/lib/libtinfo.so.6 (0x00002ab56ff28000)
libicudata.so.65 => /ssoft/spack/arvine/v1/opt/spack/linux-rhel7-broadwell/gcc-8.4.0/icu4c-65.1-fwwc3dj3bt2mm6dxkfxx4khyjjjysbqk/lib/libicudata.so.65 (0x00002ab570164000)
libtinfow.so.6 => /ssoft/spack/arvine/v1/opt/spack/linux-rhel7-broadwell/gcc-8.4.0/ncurses-6.2-fso3hu2wf3op77fmj2qxpfpgidmvadsg/lib/libtinfow.so.6 (0x00002ab571e13000)

Helvetios


> ldd R/lib/RcppArmadillo/libs/RcppArmadillo.so 
linux-vdso.so.1 (0x00007ffcc8fee000)
libopenblas.so.0 => not found
libpthread.so.0 => /lib64/libpthread.so.0 (0x00007fc613c9a000)
libgfortran.so.5 => /ssoft/spack/arvine/v1/opt/spack/linux-rhel7-haswell/gcc-4.8.5/gcc-8.4.0-fgpbrrd26pxv56imea5tnqj67vxh3a64/lib64/libgfortran.so.5 (0x00007fc613828000)
libR.so => /ssoft/spack/arvine/v1/opt/spack/linux-rhel7-broadwell/gcc-8.4.0/r-4.0.2-rr54swjmivnucqn53qlykyajbssspyzt/rlib/R/lib/libR.so (0x00007fc613196000)
libstdc++.so.6 => /ssoft/spack/arvine/v1/opt/spack/linux-rhel7-haswell/gcc-4.8.5/gcc-8.4.0-fgpbrrd26pxv56imea5tnqj67vxh3a64/lib64/libstdc++.so.6 (0x00007fc612e12000)
libm.so.6 => /lib64/libm.so.6 (0x00007fc612a90000)
libgcc_s.so.1 => /ssoft/spack/arvine/v1/opt/spack/linux-rhel7-haswell/gcc-4.8.5/gcc-8.4.0-fgpbrrd26pxv56imea5tnqj67vxh3a64/lib64/libgcc_s.so.1 (0x00007fc612878000)
libc.so.6 => /lib64/libc.so.6 (0x00007fc6124b3000)
/lib64/ld-linux-x86-64.so.2 (0x00007fc6140e2000)
libquadmath.so.0 => /ssoft/spack/arvine/v1/opt/spack/linux-rhel7-haswell/gcc-4.8.5/gcc-8.4.0-fgpbrrd26pxv56imea5tnqj67vxh3a64/lib64/libquadmath.so.0 (0x00007fc612273000)
libz.so.1 => /ssoft/spack/arvine/v1/opt/spack/linux-rhel7-haswell/gcc-4.8.5/zlib-1.2.11-zolwez4onex3iueybyju2zi55njtdg5d/lib/libz.so.1 (0x00007fc61205c000)
libRblas.so => /ssoft/spack/arvine/v1/opt/spack/linux-rhel7-broadwell/gcc-8.4.0/r-4.0.2-rr54swjmivnucqn53qlykyajbssspyzt/rlib/R/lib/libRblas.so (0x00007fc611e57000)
libintl.so.8 => /ssoft/spack/arvine/v1/opt/spack/linux-rhel7-broadwell/gcc-8.4.0/gettext-0.20.2-zsw7ypm6nwsd35emoejeqeqd24lkhtxl/lib/libintl.so.8 (0x00007fc611c4c000)
libreadline.so.8 => /ssoft/spack/arvine/v1/opt/spack/linux-rhel7-broadwell/gcc-8.4.0/readline-8.0-dz4azvauq6vvqbrsefly22tuc3tkvamk/lib/libreadline.so.8 (0x00007fc6119ff000)
libpcre2-8.so.0 => /ssoft/spack/arvine/v1/opt/spack/linux-rhel7-broadwell/gcc-8.4.0/pcre2-10.35-gatbldx3lflurv4nmrbt77xjeqpnat5u/lib/libpcre2-8.so.0 (0x00007fc6117a4000)
liblzma.so.5 => /ssoft/spack/arvine/v1/opt/spack/linux-rhel7-broadwell/gcc-8.4.0/xz-5.2.5-wohx2lptsznx3ncrffvc2aqzvk773ppl/lib/liblzma.so.5 (0x00007fc61157e000)
libbz2.so.1.0 => /ssoft/spack/arvine/v1/opt/spack/linux-rhel7-broadwell/gcc-8.4.0/bzip2-1.0.8-kkafkr2l5cgvnvv65llzx6wtjio6duyf/lib/libbz2.so.1.0 (0x00007fc61136d000)
librt.so.1 => /lib64/librt.so.1 (0x00007fc611165000)
libdl.so.2 => /lib64/libdl.so.2 (0x00007fc610f61000)
libiconv.so.2 => /ssoft/spack/arvine/v1/opt/spack/linux-rhel7-broadwell/gcc-8.4.0/libiconv-1.16-rfbb6qkjqz7myfzyaoq73lnn3c5uricd/lib/libiconv.so.2 (0x00007fc610c65000)
libicuuc.so.65 => /ssoft/spack/arvine/v1/opt/spack/linux-rhel7-broadwell/gcc-8.4.0/icu4c-65.1-fwwc3dj3bt2mm6dxkfxx4khyjjjysbqk/lib/libicuuc.so.65 (0x00007fc61082a000)
libicui18n.so.65 => /ssoft/spack/arvine/v1/opt/spack/linux-rhel7-broadwell/gcc-8.4.0/icu4c-65.1-fwwc3dj3bt2mm6dxkfxx4khyjjjysbqk/lib/libicui18n.so.65 (0x00007fc61029e000)
libgomp.so.1 => /ssoft/spack/arvine/v1/opt/spack/linux-rhel7-haswell/gcc-4.8.5/gcc-8.4.0-fgpbrrd26pxv56imea5tnqj67vxh3a64/lib64/libgomp.so.1 (0x00007fc610070000)
libopenblas.so.0 => /ssoft/spack/arvine/v1/opt/spack/linux-rhel7-broadwell/gcc-8.4.0/openblas-0.3.10-by4yhm4acpz4ytmttv6y3hjvoda5qulz/lib/libopenblas.so.0 (0x00007fc60f2ee000)
libncursesw.so.6 => /ssoft/spack/arvine/v1/opt/spack/linux-rhel7-broadwell/gcc-8.4.0/ncurses-6.2-fso3hu2wf3op77fmj2qxpfpgidmvadsg/lib/libncursesw.so.6 (0x00007fc60f0b6000)
libtinfo.so.6 => /ssoft/spack/arvine/v1/opt/spack/linux-rhel7-broadwell/gcc-8.4.0/ncurses-6.2-fso3hu2wf3op77fmj2qxpfpgidmvadsg/lib/libtinfo.so.6 (0x00007fc60ee7a000)
libicudata.so.65 => /ssoft/spack/arvine/v1/opt/spack/linux-rhel7-broadwell/gcc-8.4.0/icu4c-65.1-fwwc3dj3bt2mm6dxkfxx4khyjjjysbqk/lib/libicudata.so.65 (0x00007fc60d1cb000)
libtinfow.so.6 => /ssoft/spack/arvine/v1/opt/spack/linux-rhel7-broadwell/gcc-8.4.0/ncurses-6.2-fso3hu2wf3op77fmj2qxpfpgidmvadsg/lib/libtinfow.so.6 (0x00007fc60cf8f000)
EnigmaSong commented 1 year ago

From profile results of graphest_fastgreedy(), updating trialLL parts, lines 206--211 in graphest_fastgreedy.cpp(#7)

          // lines 206--211
          deltaNegEnt = Delta_NegEnt(habSqrdCola,habSqrdColb,habSqrdEntryab,
                                     thetaCola,thetaColb,thetaEntryab);
          oldDeltaNegEnt = Delta_NegEnt(habSqrdCola,habSqrdColb,habSqrdEntryab,
                                        oldThetaCola,oldThetaColb,oldThetaEntryab);   
          // Update log-likelihood - O(k)
          trialLL += (deltaNegEnt-oldDeltaNegEnt)/sampleSize;

and here is Delta_NegEnt(). It does element-wise addition, multiplications, log, and dot products. I do not know what causes this huge timing difference.

double Delta_NegEnt(const arma::vec &habSqrdCola, 
                    const arma::vec &habSqrdColb,
                    const double &habSqrdEntryab,
                    const arma::vec &thetaCola,
                    const arma::vec &thetaColb,
                    const double &thetaEntryab){
  return(dot(habSqrdCola,(thetaCola%log(thetaCola) + (1.0-thetaCola)%log(1.0-thetaCola)))+
         dot(habSqrdColb,(thetaColb%log(thetaColb) + (1.0-thetaColb)%log(1.0-thetaColb)))-
         (habSqrdEntryab*(thetaEntryab*log(thetaEntryab) + (1.0-thetaEntryab)*log(1.0-thetaEntryab))));
}

Here are profiling results (in seconds):

Step Lab (Linux) Fidis Helvetios MacBook Air (M1, 2020)
Define variables 0.0038 0.0042 0.0016 0.0009
Initialization 0.0063 0.0082 0.0029 0.0014
Gen. iVecs 0.1336 0.1862 0.1416 0.0578
Update trials 2.1997 17.366 1.2732 0.7047
Prep. Label swap 1.0244 3.7415 0.8158 0.1203
Update oldThetas 3.6649 9.9694 2.3447 0.6425
Update Acounts 11.1232 40.125 10.3423 4.3768
Update Thetas 3.8592 5.2240 2.1793 0.9056
Update TrialLLs 360.331 690.693 12.4279 2.5998
Update BestLLs 0.3361 0.8009 0.2354 0.0653
Update oldBestLL and best counts 0.0001 0.0002 0.0001 0.0002
EnigmaSong commented 1 year ago

Is the issue from log() is not used openmp for element-wise log()? Armadillo library should use openmp with the option ARMA_USE_OPENMP and ARMA_OPENMP_THRESHOLD in config.hpp.