drostlab / myTAI

Evolutionary Transcriptomics with R
https://drostlab.github.io/myTAI/
GNU General Public License v2.0
38 stars 16 forks source link

OpenMP for parallelisation with the Apple M1 chip #38

Open LotharukpongJS opened 11 months ago

LotharukpongJS commented 11 months ago

Describe the bug The same speed-up achieved via parallelisation with the Intel chip for Mac doesn't work with the M1 chip. The difference in chip affects the README.md and the src/Makevars:

https://github.com/drostlab/myTAI/blob/699b78f10a619cfd97e584cf2277f9d04e938544/README.md?plain=1#L35-L37

https://github.com/drostlab/myTAI/blob/699b78f10a619cfd97e584cf2277f9d04e938544/src/Makevars#L1-L13

With the M1 chip, /usr/local/opt/libomp/lib/libomp.dylib, /usr/local/opt/libomp/include and /usr/local/opt/libomp/lib do not exist.

$ ls /usr/local/opt/libomp/lib/libomp.dylib ls: /usr/local/opt/libomp/lib/libomp.dylib: No such file or directory

Instead the homologous locations are probably:

/usr/local/opt/libomp/lib/libomp.dylib -> /opt/homebrew/opt/libomp/lib/libomp.dylib /usr/local/opt/libomp/include -> /opt/homebrew/opt/libomp/include /usr/local/opt/libomp/lib -> /opt/homebrew/opt/libomp/lib

In an attempt so solve it, I installed the libraries via brew (arch -arm64 brew reinstall libomp) and changed the locations in the src/Makevars to correspond to the messages in the brew installation:

For compilers to find libomp you may need to set:
  export LDFLAGS="-L/opt/homebrew/opt/libomp/lib"
  export CPPFLAGS="-I/opt/homebrew/opt/libomp/include"

Thus for src/Makevars:

# Disable long types from C99 or CPP11 extensions
PKG_CPPFLAGS = -I../src -DRCPP_DEFAULT_INCLUDE_CALL=false -DCOMPILING_MYTAI -DBOOST_NO_INT64_T -DBOOST_NO_INTEGRAL_INT64_T -DBOOST_NO_LONG_LONG -DRCPP_USING_UTF8_ERROR_STRING -DRCPP_USE_UNWIND_PROTECT ${MYTAI_COMPILER_FLAGS}

OPENMP_SUPPORTED := $(shell $(CC) -fopenmp -dM -E - < /dev/null 2>&1 | grep -c "openmp")
LIBOMP_SUPPORTED := $(shell [ -d /opt/homebrew/opt/libomp/include ] && echo 1)
ifeq ($(OPENMP_SUPPORTED),1)
 ifeq ($(LIBOMP_SUPPORTED),1)
    PKG_CPPFLAGS += -I/opt/homebrew/opt/libomp/include
    LDFLAGS=-L/opt/homebrew/opt/libomp/lib
    PKG_CXXFLAGS += -Xpreprocessor -fopenmp
    PKG_LIBS += -lomp
 endif
endif

I also added the symlink as suggested in the README.md, with modifications I though were appropriate.

$ cd /usr/local/lib
$ ln -s /opt/homebrew/opt/libomp/lib/libomp.dylib ./libomp.dylib
$ ls -l libomp.dylib
lrwxr-xr-x  1 root  wheel  43 Aug 18 11:08 libomp.dylib -> /opt/homebrew/lib/gcc/current/libgomp.dylib

I then ran roxygen2::roxygenise(), which gave me the error at the end

─  DONE (myTAI)
Error in dyn.load(dll_copy_file) : 
  unable to load shared object '/var/folders/p0/9gxqqj352q50zc6ssdncrhv80007sq/T//RtmpNEFRUA/pkgload770229cb13/myTAI.so':
  dlopen(/var/folders/p0/9gxqqj352q50zc6ssdncrhv80007sq/T//RtmpNEFRUA/pkgload770229cb13/myTAI.so, 0x0006): symbol not found in flat namespace '___kmpc_barrier'

Is there a way to resolve this?

Expected behaviour The same speed-up achieved via parallelisation with the Intel chip for Mac works with the M1 chip

Session info:

> utils::sessionInfo()
R version 4.2.2 (2022-10-31)
Platform: x86_64-apple-darwin17.0 (64-bit)
Running under: macOS Ventura 13.4.1

Matrix products: default
LAPACK: /Library/Frameworks/R.framework/Versions/4.2/Resources/lib/libRlapack.dylib

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] testthat_3.1.10

loaded via a namespace (and not attached):
 [1] fs_1.6.3            usethis_2.2.0       devtools_2.4.5      doParallel_1.0.17  
 [5] RColorBrewer_1.1-3  rprojroot_2.0.3     tools_4.2.2         profvis_0.3.8      
 [9] backports_1.4.1     utf8_1.2.3          R6_2.5.1            nortest_1.0-4      
[13] colorspace_2.1-0    urlchecker_1.0.1    withr_2.5.0         tidyselect_1.2.0   
[17] gridExtra_2.3       prettyunits_1.1.1   processx_3.8.2      myTAI_1.0.1.9000   
[21] compiler_4.2.2      cli_3.6.1           xml2_1.3.4          desc_1.4.2         
[25] scales_1.2.1        readr_2.1.4         callr_3.7.3         stringr_1.5.0      
[29] digest_0.6.33       pkgconfig_2.0.3     htmltools_0.5.5     sessioninfo_1.2.2  
[33] fastmap_1.1.1       htmlwidgets_1.6.2   rlang_1.1.1         rstudioapi_0.14    
[37] shiny_1.7.4         generics_0.1.3      farver_2.1.1        dplyr_1.1.2        
[41] car_3.1-2           magrittr_2.0.3      Matrix_1.6-0        Rcpp_1.0.11        
[45] munsell_0.5.0       fansi_1.0.4         abind_1.4-5         lifecycle_1.0.3    
[49] stringi_1.7.12      carData_3.0-5       MASS_7.3-60         decor_1.0.1        
[53] brio_1.1.3          pkgbuild_1.4.0      plyr_1.8.8          grid_4.2.2         
[57] parallel_4.2.2      promises_1.2.0.1    crayon_1.5.2        miniUI_0.1.1.1     
[61] lattice_0.21-8      cowplot_1.1.1       splines_4.2.2       hms_1.1.3          
[65] knitr_1.43          ps_1.7.5            pillar_1.9.0        ggpubr_0.6.0       
[69] ggsignif_0.6.4      reshape2_1.4.4      codetools_0.2-19    pkgload_1.3.2.1    
[73] glue_1.6.2          remotes_2.4.2       vctrs_0.6.3         tzdb_0.4.0         
[77] httpuv_1.6.11       foreach_1.5.2       gtable_0.3.3        purrr_1.0.2        
[81] tidyr_1.3.0         cachem_1.0.8        ggplot2_3.4.3       cpp11_0.4.6        
[85] xfun_0.39           mime_0.12           xtable_1.8-4        broom_1.0.5        
[89] roxygen2_7.2.3      rstatix_0.7.2       later_1.3.1         survival_3.5-5     
[93] tibble_3.2.1        iterators_1.0.14    memoise_2.0.1       fitdistrplus_1.1-11
[97] ellipsis_0.3.2   
LotharukpongJS commented 11 months ago

For further context, the multithreading should be seen when running myTAI::PlotSignatureTransformed() for example:

> library(myTAI)
> data("PhyloExpressionSetExample")
> myTAI::PlotSignatureTransformed(PhyloExpressionSetExample)
Proceeding with the FlatLineTest

Generating PlotSignature() for transformation: none
Plot signature: ' TAI ' and test statistic: ' FlatLineTest ' running  1000  permutations.

[ Number of Eigen threads that are employed on your machine: 1 ]

[ Computing age assignment permutations for test statistic ... ]
[=========================================] 100%   
[ Computing variances of permuted transcriptome signatures ... ]

[ Number of Eigen threads that are employed on your machine: 1 ]

[ Computing age assignment permutations for test statistic ... ]
[=========================================] 100%   
[ Computing variances of permuted transcriptome signatures ... ]

Total runtime of your permutation test: 3.97  seconds.

-> We recommended using at least 20000 permutations to achieve a sufficient permutation test.

etc.

Number of Eigen threads that are employed on your machine: 1 should be Number of Eigen threads that are employed on your machine: 8 if it is working for my machine :)

Anyway, wishing you all a nice Friday afternoon!

HajkD commented 11 months ago

Dear @LotharukpongJS

Thank you very much for making me aware of this!

@lavakin and I will look into this in detail.

With very bets wishes, Hajk

HajkD commented 10 months ago

Dear All,

Maybe some guidelines here could be useful: https://mac.r-project.org/openmp/ ?

Many thanks, Hajk