Closed maweigert closed 7 years ago
Thanks for the report. That's strange, I cannot reproduce it on my old laptop with geForce 9400M (and I get an error of 9e-7). I will have access to my mac with GF750M next week when I'm at my office, so I can try it out too. Could you tell me which version/revision of reikna you are using?
Also, try the following reikna-only code:
from __future__ import print_function
import numpy as np
from reikna import cluda
from reikna.fft import FFT
dshape = (128,)*2
np.random.seed(0)
input = (np.random.uniform(-1,1,dshape)).astype(np.complex64)
thr = cluda.ocl_api().Thread.create(interactive=True)
buf_g = thr.to_device(input)
fft = FFT(buf_g).compile(thr, fast_math = True)
fft(buf_g, buf_g)
fft(buf_g, buf_g, inverse = True)
output = buf_g.get()
print("{}:\t\t{}".format(thr._device.name, np.amax(np.abs(input-output))))
Also, could you do some more tests?
pycuda
installed, replace ocl_api
with cuda_api
and check if the bug is still there;fast_math=False
?128
elements?Thanks for looking into that!
system/version: Mac OSX 10.11.6, python 2.7.12, reikna 0.6.7
the bug persists with the reikna-only code
GeForce GT 750M: shape = (128, 128) fast_math = True 1.1197116
using pycuda/cuda_api works fine
GeForce GT 750M: shape = (128, 128) fast_math = True 7.882949e-07
switching to fast_math = False
works fine
GeForce GT 750M: shape = (128, 128) fast_math = False 4.618963e-07
strangely enough, it fails for some 1D shapes, too:
GeForce GT 750M: shape = (64,) fast_math = True 2.27608268233e-07
GeForce GT 750M: shape = (128,) fast_math = True 2.01620565576e-07
GeForce GT 750M: shape = (256,) fast_math = True 3.03235623278e-07
GeForce GT 750M: shape = (512,) fast_math = True 3.45577376493e-07
GeForce GT 750M: shape = (1024,) fast_math = True 0.976445674896
GeForce GT 750M: shape = (2048,) fast_math = True 3.66386217365e-07
GeForce GT 750M: shape = (4096,) fast_math = True 1.1036427021
GeForce GT 750M: shape = (8192,) fast_math = True 1.1503098011
it seems the culprit is in the native_sin/cos function, as removing the following ifdef switch
#ifdef COMPILE_FAST_MATH
res.x = native_cos(theta);
res.y = native_sin(theta);
#else
in cluda/functions.mako
restores normal behaviour. Yet this is strange, as pyfft does almost the same in the kernels with fast_math=True
but runs fine on the same GPU.
Yes, it is quite strange. Removing the natice_cos()
/sin()
usage pretty much negates any performance benefit from fast_math=True
, so would rather not do that.
I suspect there may be some bug in Apple's OpenCL driver (I have found several over the years myself). It is usually some kind of strange interplay between the exact GPU operations invoked and the global/local size. My general approach in such cases is to isolate the offending kernel and start removing parts until I end up with something that reproduces the bug and is small enough to open an issue in the Apple's tracker. It is a quite lengthy process, though, and I completely understand if you don't want to go through it.
I have tested the code on OSX 10.11.3, and could not reproduce the bug, but it was a FirePro video card, so the local sizes used could be different. Could you do several more things:
Comment the #if
block in cluda/kernel.mako
starting from #if defined(cl_khr_fp64)
. This seems to be one of the differences from pyfft
, which only enables it when the array has a double-precision datatype.
Check and tell me which global/local sizes reikna
and pyfft
use (let's say for the smallest array when the bug is reproduced, that is the 1D one with 1024 elements). For the reikna
code, add the following lines:
for call in fft._kernel_calls:
print(call._kernel.global_size, call._kernel.local_size)
For pyfft
code (add after the actual call, since the kernels are created on the first invocation):
for k in plan._kernels:
print k._func_forward._global_size, k._func_forward._block_size
I suspect there may be some bug in Apple's OpenCL driver (I have found several over the years myself)
Indeed, that was it!! After installing the Nvidia Web drivers (346.03.15f02) everything was fine again. So it seem the default drivers on El Capitan (310.42.25f01) have a bug in the native_sin/cos functions.
Thanks for your help!
Hi Bogdan,
I was porting some fft based code from pyfft to reikna and was experiencing some inaccuracies in the fft calculations with fast_math, depending on the hardware I am using.
I did the following simple roundtrip comparison
https://gist.github.com/maweigert/0bb5d16b3bb9a3d0659c7d48ee8fd32a
and got very different behaviour depending on the GPU:
While pyfft on the same input (and fast_math = True) gave
So it seems not to be GPU but reikna specific.
Did you ever see something similar, or can you reproduce this?
Cheers and thanks for the package!
M