Open juwhan-k opened 7 months ago
Actually, we're experimenting with https://github.com/juwhankim/threadpool, a work stealing queue. It shows about x106 speed boost up, compared to the jitscript-pytorch based implementation of chacha20.
For the time being, the CPU implementation and integration will be the top priority task.
There is currently a CPU based implementation, but its speed is not optimal to say the least.
Probably, use the threadpool at https://github.com/juwhankim/another_cpp_thread, and make an equivalent to CUDA-like implementation?