Closed AnFreTh closed 2 weeks ago
Hello,
Yes it's expected because you're running on the CPU. They are good for doing small sequential computations very fast.
On the GPU, it's preferred to do bigger computations but fewer of them (sequentially speaking). That's exactly what's the parallel scan is done for : instead of L
small sequential steps, only log2(L)
bigger ones (if enough parallelization).
So if you're on CPU, yes it may be a good idea to stay with the sequential scan! I should put that on the README. Eventually though, with bigger dimensions, pscan should be better even on the CPU.
Hope this helps
Thanks for the reply! That answers it. I will close the issue.
Great repo!
I was wondering about the pscan speed compared to a simple for loop. I assumed pscan to be faster in any scenario. However, running the selective scan steps on cpu is faster with the for loop from the selective_scan_seq variant compared to the pscan.
Simulate the data -> Sequence length 320
With pscan
Around 10 seconds on cpu
With seq (for loop)
Around 2 seconds.
Is this expected and the speed advantages only come into play during training (backward passes)?