Investigate trace data streaming on CPU/GPU

J08nY / pyecsca

Python Elliptic Curve Side-Channel Analysis toolkit.

MIT License

56 stars 15 forks source link

CUDA streams

https://leimao.github.io/blog/CUDA-Stream/

https://developer.download.nvidia.com/CUDA/training/StreamsAndConcurrencyWebinar.pdf

https://on-demand.gputechconf.com/gtc/2014/presentations/S4158-cuda-streams-best-practices-common-pitfalls.pdf

Last resource above interesting.

Using this in our case has to be clever, as we are trying to avoid hitting the GPU memory limit, which somewhat limits our parallelization (If you are at the same time copying input data chunk to the device, computing on some other input data chunk, and copying out the output data, then you need GPU memory space for 2 chunks and 1 output). I think the way forward is to make chunk size and number of streams configurable and then look into this space for viable configurations (stuff fits in memory) and their speed (GPU is saturated the most).

J08nY / pyecsca

Investigate trace data streaming on CPU/GPU #31

CUDA streams