Allow external BLAS implementations

firelab / windninja

A diagnostic wind model developed for use in wildland fire modeling.

https://weather.firelab.org/windninja/

Other

120 stars 44 forks source link

Allow external BLAS implementations #311

Open ksshannon opened 6 years ago

ksshannon commented 6 years ago

I'm curious if some hand-tuned BLAS implementations would run faster for us on some hardware (OpenBLAS has arch and generation specific code, I think). I also did some simple testing, and we have our own implementation of dcopy(actually two),which the compiler doesn't optimize out until -O3 is set. I propose we introduce ninja_blas.c/h, and use the current internal implementations, or allow the user to supply one. the mkl and blas implementations will be treated separately.

/cc @jforthofer

jforthofer commented 6 years ago

Sounds fine to me, it's been a long time since I looked into that BLAS stuff. It would be interesting to see if improvements could be made. It should be coded up to pretty much drop in I think. The matrix vector multiplication is by far the most computationally intensive function, don't forget it's stored in a compressed sparse row storage format.