Question about SIMD Optimization

RcppCore / RcppArmadillo

Rcpp integration for the Armadillo templated linear algebra library

192 stars 56 forks source link

Question about SIMD Optimization #404

Closed pati-ni closed 1 year ago

pati-ni commented 1 year ago

According to this guide to get these optimizations when working with native armadillo library we need to pass compiler parameters. How does this translate to RcppArmadillo? Are these optimizations already enabled?

eddelbuettel commented 1 year ago

RcppArmadillo is header-only (apart from the example / illustration that is fastLm()) so it is up to your package to organize its compiler flags.

The default configuration we set in the RcppArmadillo headers in order to wrap and set up for R use should not get in the way. If you find otherwise, let us know!

Demos with SIMD turned on would be welcome, this may make for a nice post at the Rcpp Gallery. Maybe you can write something up?

pati-ni commented 1 year ago

After performing some benchmarks independently (outside of R) with Rcpp and RcppArmadillo it looks like that the proposed optimizations do not make a difference on a fairly recent compiler (gcc 12) with C++14.

If I link with -O0 which significantly slows down the benchmark. However I did not look into the assembly to determine which conditions use or do not use SIMD commands. Since C++ compilation in R by default has -O1 I do not think there are a lot to benefit from a guide. Feel free to close the issue.

eddelbuettel commented 1 year ago

I had a similar hunch based on the work we have done with RcppSimdJson (which is crazy fast thanks to really well engineered code inside). But it does not require 'magic switches' -- the modern compiler mostly know already. So yes: -O2 or -O3 coupled with -march=native will (on suitable modern hardware) already all that is there and the package should not leave anything behind. Which, I think, your tests confirmed. So closing this.