Open pygy opened 10 years ago
Hi,
actually I was just waiting the first stable release of the 2.1 branch but, as you suggest, it is probably ok to migrate with the next release of GSL Shell. I've actually a lot of minor changes to include and a new release is a good thing.
Otherwise what about a new gsl shell's branch in github to integrate luajit 2.1 ?
Otherwise what about a new gsl shell's branch in github to integrate luajit 2.1 ?
Why not.
AFAICT, the parse.c
and Makefile
modifications work as is in 2.1.
I'll also have to send you a patch for compiling on OS X 10.8
Now there is a v2.1 branch in GSL Shell's repository
https://github.com/franko/gsl-shell/tree/master-lj2.1
The merge was very easy thanks to the power of git :-) and everything seems to work just fine.
Francesco
Cool :-)
The Julia guys are about to add the LuaJIT/GSL Shell benchmarks you wrote on their home page. I'll point them to the LJ 2.1 branch.
LuaJIT v2.1 is 10 times faster than v2.0 for parseint
, but a bit slower for mandel
(but, in both cases, it still beats the hell out of C :-).
The pure JavaScript (V8) implementation of rand_mat_stat
is faster than its GSL Shell counterpart, which relies on BLAS, as do the C, Julia and Fortran benchmars. The latter three are also faster than LuaJIT/GSL Shell. Maybe you're not using the same BLAS?
LuaJIT is ~10 times slower than C for rand_mat_mul
, but faster than JS.
Check here for the results on my machine: https://github.com/JuliaLang/julia/commit/9a57b996c91383527404f1adbdc0b29af8e6f798#commitcomment-5996981
Edit: note also that quicksort can be made faster by switching to a FFI array.
The benchmark results looks good to me.
I agree that there are some odd things. I already noticed in past that Julia was faster in rand_mat_mul but I cannot tell the reason. The only things I can suggest is to ensure that openblas is actually used for gsl shell. For me the over speed should be given by the underlying BLAS implementation.
Otherwise I would not be too picky about this benchmark results and I'm afraid I don't have enough time to further investigate the problem.
In any case I will be glad if they include lua/gsl-shell in their benchmark page. Thank you for your help about that.
How can I set the BLAS version?
On my machine, the GSL-based rand_mat_mul
is ~10% faster than a straight port of the JavaScript code to Lua:
local darray = ffi.typeof("double[?]")
local function randd(n)
local v, r
v = darray(n)
r = rng.new('rand')
for i = 0, n-1 do
v[i] = r:get()
end
return v
end
-- Transpose mxn matrix.
local function mattransp(A, m, n)
local T = darray(m * n)
for i = 0, m - 1 do
for j = 0, n-1 do
T[j * m + i] = A[i * n + j]
end
end
return T
end
local function matmul(A,B,m,l,n)
local C, total
C = darray(m*n)
-- Transpose B to take advantage of memory locality.
B = mattransp(B,l,n)
for i = 0, m - 1 do
for j = 0, n - 1 do
total = 0
for k = 0, l - 1 do
total = total + A[i*l+k]*B[j*l+k]
end
C[i*n+j] = total
end
end
return C
end
local function randmatmulLJ(n)
local A, B
A = randd(n*n)
B = randd(n*n)
return matmul(A, B, n, n, n)
end
timeit(|| randmatmul(1000), "rand_mat_mul") --> 1129.19
timeit(|| randmatmulLJ(1000), "rand_mat_mul_LJ") --> 1255.42
BTW:
$ node perf.js
...
javascript,rand_mat_mul,2933
:-)
To check the BLAS library you have to "ldd" the executable and see to which file libblas.so points to by using "ls -l
I'm now wondering if Julia is faster because it does transpose the matrix before the multiplication just like JS is doing. In principle I should do some tests with dgemm with and without transpose like in the JS code but unfortunately I don't have time to work on that.
There's no ldd
on OS X, otool -L
does the trick.
$ otool -L gsl-shell | grep blas
/usr/local/lib/libgslcblas.0.dylib (compatibility version 1.0.0, current version 1.0.0)
$ ls -l /usr/local/lib/libgslcblas.0.dylib
lrwxr-xr-x 1 pygy staff 42 Apr 12 23:57 /usr/local/lib/libgslcblas.0.dylib -> ../Cellar/gsl/1.16/lib/libgslcblas.0.dylib
The GSL, as installed by brew
relies on the default libgslcblas
. I've tried to redirect the symlink to a freshly compiled OpenBLAS, but it complains about version issues (1.0.0 required, 0.0.0 found). The same goes for the Julia BLAS.
I'm also trying to build the gsl by hand, but I don't know how to tell it to use another BLAS.
I got it to compile with OpenBLAS (by adding the proper paths and options in the GSL Shell Makefile).
randommatmul
is now as fast as C/Julia :-)
It may be nice to add the possibility to customize LIBS
and LDFLAGS
in makeconfig
.
Good :-)
Actually the libraries are supposed to be configurable using the file "makepackages" but may be this is not very intuitive.
On linux "makepackages" links with any "blas" library (using GSL_LIBS) provided by the system and thus openblas is not required. It is possible to modify the default makefile to links explicitly to openblas but I'm not sure this is a good idea.
May be a warning can be shown during compile time if the gslcblas library is used since this latter is really slow.
Suggestion & patches are welcome.
makepackage
is probably fine... I tend to explore code rather than read the docs (too often, there are none), and I though that makeconfig
was were users were supposed to tweak things.
OS X also provides a fast BLAS, I'll look how to link to it.
I found the system BLAS, which is even faster than OpenBLAS, but I don't know if it is found at the same path for all OS X versions.
Edit: actally, adding -lBLAS
to the GSL_LIBS
does the trick, without adding any path to the linker (which actually makes sense).
Do you plan to migrate to the 2.1 branch? It is faster than v2.0.x, and AFAIK stable enough to be used in production at CloudFlare.