N3PDF / mcgpu

Proof of concept of GPU integration
0 stars 0 forks source link

pyOpenCL #10

Closed scarrazza closed 5 years ago

scarrazza commented 5 years ago

Just a first implementation. The current version uses just the events_kernel, because believe me or not OpenCL does not provide an official curand.

scarlehoff commented 5 years ago

Actually I am not very surprised. I've found random number generation implementations to be really bad at parallelization so I can believe the opencl devs didn't find any they liked/thought it was useful.

scarrazza commented 5 years ago

On the other hand I prefer the pyopencl api instead of cupy, it is much simpler and even ask you which hardware you would like to use before executing the kernel (with just 1 line of code).

scarrazza commented 5 years ago

BTW, the closest to curand I have found is http://cas.ee.ic.ac.uk/people/dt10/research/rngs-gpu-mwc64x.html. However I can't find the respective rand_max...

scarlehoff commented 5 years ago

On the other hand I prefer the pyopencl api instead of cupy, it is much simpler and even ask you which hardware you would like to use before executing the kernel (with just 1 line of code).

I have exactly the opposite feeling in C++, where I find cuda much simpler and elegant than the openCL version. I agree with you in that pycuda has unnecessary boilerplate but I think it's just because the support is experimental and will be fixed soonish.

That said, I have one big problem with OpenCL and is that everything is a string which fucks syntax highlighting in all editors.

scarrazza commented 5 years ago

Indeed, the C and C++ are horrible.

scarrazza commented 5 years ago

I just added a curand alternative, however I still do not have any glue about the block/thread conversion from cuda to opencl...

scarlehoff commented 5 years ago

I'll merge everything into master since we have the first milestone which is GPU-CP code that produces the same result and are moving towards the next one: FPGA

scarrazza commented 5 years ago

Can we do that later today? I would like to make more tests.

scarlehoff commented 5 years ago

the merging? I can leave this PR to merge last.

scarrazza commented 5 years ago

Yes, exactly I mean this PR. This last commit reduces the time from 0.32 to 0.19!!

scarlehoff commented 5 years ago

I am afraid we changed the dimensions from 7 to 2 in yesterday's commit and that's where the change is coming from...

That said, it seems changing the number of threads doesn't make a big difference until you go below 10...

scarrazza commented 5 years ago

No, no, I just tested and we have a 0.32s -> 0.2s reduction.

scarlehoff commented 5 years ago

That's strange, it doesn't work when I do it...


VEGAS MC numba, ncalls=1000000:
Results for interation 1: 1.0324372178489554 +/- 0.20601229671339305
Results for interation 2: 0.9504402931458804 +/- 0.08229174790224937
Results for interation 3: 0.9454731691298963 +/- 0.04612592055559303
Results for interation 4: 0.9523242932526317 +/- 0.030039048520191736
Results for interation 5: 0.9584911398891353 +/- 0.02098809870026352
(0.9554064165410279, 0.015772765065079884)
time (s): 0.34163641929626465
scarrazza commented 5 years ago

Umm, on dom I am getting:

VEGAS MC numba, ncalls=1000000: Results for interation 1: 0.9224642279497275 +/- 0.0036813330595876206 Results for interation 2: 0.9309521554265873 +/- 0.0031585056976000865 Results for interation 3: 0.9391437769918975 +/- 0.0027468498672468845 Results for interation 4: 0.9466163396077905 +/- 0.0024172158294661844 Results for interation 5: 0.9532004759786942 +/- 0.0021491467216233715 (0.9424141922680981, 0.0012001987621295062) time (s): 0.21056771278381348

scarlehoff commented 5 years ago

I am afraid you are integrating 2 dimensions. Have a look at integrand.py