J08nY / pyecsca

Python Elliptic Curve Side-Channel Analysis toolkit.
https://pyecsca.org/
MIT License
56 stars 15 forks source link

Investigate trace data streaming on CPU/GPU #31

Closed J08nY closed 1 year ago

J08nY commented 1 year ago

Currently, if not using the HDF5 trace set "inplace" functionality, all trace data is loaded into memory where it is operated on. This puts a limit on the size of the traceset. Using HDF5 this loading into memory can be delayed somewhat, but will likely happen when the data is read or computed on. Perhaps HDF5 or some other method of streaming the data could be used to allow operating on large tracesets both on the CPU/GPU as it seems that trace set size is the major bottleneck.

J08nY commented 1 year ago

CUDA streams

https://leimao.github.io/blog/CUDA-Stream/

https://developer.download.nvidia.com/CUDA/training/StreamsAndConcurrencyWebinar.pdf

https://on-demand.gputechconf.com/gtc/2014/presentations/S4158-cuda-streams-best-practices-common-pitfalls.pdf

Last resource above interesting.

Using this in our case has to be clever, as we are trying to avoid hitting the GPU memory limit, which somewhat limits our parallelization (If you are at the same time copying input data chunk to the device, computing on some other input data chunk, and copying out the output data, then you need GPU memory space for 2 chunks and 1 output). I think the way forward is to make chunk size and number of streams configurable and then look into this space for viable configurations (stuff fits in memory) and their speed (GPU is saturated the most).