Closed cgarling closed 1 year ago
Not having much luck with the profiler at the moment; looks like some bug with graphics over wsl. Tried with MKL and timing results were the same; performance is slightly better when the number of BLAS threads is 1 as set via LinearAlgebra.BLAS.set_num_threads(1)
than when we have 8 BLAS threads. I don't imagine it's a big issue for many people (my solve goes from 2.8 -> 2.2 seconds) but worth remembering.
I made a note in the documentation and in the example Jupyter notebook; think that's all I'll do for now.
Optimizations using Optim.jl's BFGS implementation seem to be spending a lot of time in BLAS calls. By default BLAS has multi-threading but for problems of our sizes this threading is not efficient. Worth profiling and examining more closely.