Cuda C - Githubissues

N3PDF / mcgpu

Proof of concept of GPU integration

0 stars 0 forks source link

Cuda C #8

Closed scarlehoff closed 5 years ago

scarlehoff commented 5 years ago

Turns out writing cuda code is very easy these days.

This is obviously not a final version as I almost literally did cat *.c > cpp-cuda.cu and changed mallocs to cudaMallocs.

A few things to note,

The Titan V is really good. Memory transfer feels fast, I would like to have the gamer-grade ones to see the difference.
Nivida's UnifiedMemory is just allocating memory in both the GPU and the ram and keeping them synchronized (every time you tell it to synchronize).

scarrazza commented 5 years ago

Looks good and runs well. I am pretty sure we should get similar results with AMD (ahaha opencl) because it uses the same memory technology HBM2. Probably the RTX will be slower due the GDDR.

scarlehoff commented 5 years ago

10^6 events, 7 dimensions, 0.3s Good luck getting to the same numbers with python :P

This can still be improved because the reduction of the arrays is done in CPU (so they are copied over just for that) and the refining of the grid could also be done in CPU (so that in the end you only copy back the final result).

When both these things are done in GPU

[ ] Reduction
[ ] Refining

in both Cuda and OpenCL we can think about having more complicated integrands.

scarrazza commented 5 years ago

Cool.