Closed scarrazza closed 5 years ago
Actually I am not very surprised. I've found random number generation implementations to be really bad at parallelization so I can believe the opencl devs didn't find any they liked/thought it was useful.
On the other hand I prefer the pyopencl api instead of cupy, it is much simpler and even ask you which hardware you would like to use before executing the kernel (with just 1 line of code).
BTW, the closest to curand
I have found is http://cas.ee.ic.ac.uk/people/dt10/research/rngs-gpu-mwc64x.html. However I can't find the respective rand_max
...
On the other hand I prefer the pyopencl api instead of cupy, it is much simpler and even ask you which hardware you would like to use before executing the kernel (with just 1 line of code).
I have exactly the opposite feeling in C++, where I find cuda much simpler and elegant than the openCL version. I agree with you in that pycuda has unnecessary boilerplate but I think it's just because the support is experimental and will be fixed soonish.
That said, I have one big problem with OpenCL and is that everything is a string which fucks syntax highlighting in all editors.
Indeed, the C and C++ are horrible.
I just added a curand alternative, however I still do not have any glue about the block/thread conversion from cuda to opencl...
I'll merge everything into master since we have the first milestone which is GPU-CP code that produces the same result and are moving towards the next one: FPGA
Can we do that later today? I would like to make more tests.
the merging? I can leave this PR to merge last.
Yes, exactly I mean this PR. This last commit reduces the time from 0.32 to 0.19!!
I am afraid we changed the dimensions from 7 to 2 in yesterday's commit and that's where the change is coming from...
That said, it seems changing the number of threads doesn't make a big difference until you go below 10...
No, no, I just tested and we have a 0.32s -> 0.2s reduction.
That's strange, it doesn't work when I do it...
VEGAS MC numba, ncalls=1000000:
Results for interation 1: 1.0324372178489554 +/- 0.20601229671339305
Results for interation 2: 0.9504402931458804 +/- 0.08229174790224937
Results for interation 3: 0.9454731691298963 +/- 0.04612592055559303
Results for interation 4: 0.9523242932526317 +/- 0.030039048520191736
Results for interation 5: 0.9584911398891353 +/- 0.02098809870026352
(0.9554064165410279, 0.015772765065079884)
time (s): 0.34163641929626465
Umm, on dom I am getting:
VEGAS MC numba, ncalls=1000000: Results for interation 1: 0.9224642279497275 +/- 0.0036813330595876206 Results for interation 2: 0.9309521554265873 +/- 0.0031585056976000865 Results for interation 3: 0.9391437769918975 +/- 0.0027468498672468845 Results for interation 4: 0.9466163396077905 +/- 0.0024172158294661844 Results for interation 5: 0.9532004759786942 +/- 0.0021491467216233715 (0.9424141922680981, 0.0012001987621295062) time (s): 0.21056771278381348
I am afraid you are integrating 2 dimensions. Have a look at integrand.py
Just a first implementation. The current version uses just the
events_kernel
, because believe me or not OpenCL does not provide an officialcurand
.