Closed scarlehoff closed 5 years ago
Looks good and runs well. I am pretty sure we should get similar results with AMD (ahaha opencl) because it uses the same memory technology HBM2. Probably the RTX will be slower due the GDDR.
10^6 events, 7 dimensions, 0.3s Good luck getting to the same numbers with python :P
This can still be improved because the reduction of the arrays is done in CPU (so they are copied over just for that) and the refining of the grid could also be done in CPU (so that in the end you only copy back the final result).
When both these things are done in GPU
in both Cuda and OpenCL we can think about having more complicated integrands.
Cool.
Turns out writing cuda code is very easy these days.
This is obviously not a final version as I almost literally did
cat *.c > cpp-cuda.cu
and changedmallocs
tocudaMallocs
.A few things to note,