Open sylee0124 opened 1 year ago
These are correct, we are cleaning up the code/algorithms for the fast block FFT and three pass algorithm across into a single package, this repository is focused on the architecture pieces for now. Will update this issue when released!
On Tue, Mar 14, 2023 at 6:52 AM Lee Seung Yul @.***> wrote:
Hi, I'm bit confused about current implementations of the repo and implementations used/discussed in related papers. I'll just state what I think is true. Please correct me if I'm wrong.
-
Flashconv from h3 Fused kernel is implemented at fftconv_cuda.cu but it is not using block FFT.
FlashButterfly in "Simple Hardware-Efficient Long Convolutions for Sequence Modeling" long_conv.py uses BlockFFT (which is same as Butterfly Decomposition) with support for learnable parameters for dft_matrix. But not using fused kernel and Three-pass algorithm is also not implemented.
— Reply to this email directly, view it on GitHub https://github.com/HazyResearch/safari/issues/5, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABDDIIX33MA6IYWP65TVIALW4BEWFANCNFSM6AAAAAAV2IUUYI . You are receiving this because you are subscribed to this thread.Message ID: @.***>
Thanks for verifying :) When can I expect this performance update? Will it happen anytime soon?
Hopefully soon! I’ve been traveling for a bit, but have some time to code again soon.
On Tue, Mar 14, 2023 at 8:39 AM Lee Seung Yul @.***> wrote:
Thanks for verifying :) When can I expect this performance update? Will it happen anytime soon?
— Reply to this email directly, view it on GitHub https://github.com/HazyResearch/safari/issues/5#issuecomment-1468031765, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABDDIIQDGU7RRW6X2BNZCX3W4BRJXANCNFSM6AAAAAAV2IUUYI . You are receiving this because you commented.Message ID: @.***>
Hi, I'm bit confused about current implementations of the repo and implementations used/discussed in related papers. I'll just state what I think is true. Please correct me if I'm wrong.
Flashconv from h3 Fused kernel is implemented at fftconv_cuda.cu but it is not using block FFT.
FlashButterfly in "Simple Hardware-Efficient Long Convolutions for Sequence Modeling" long_conv.py uses BlockFFT (which is same as Butterfly Decomposition) with support for learnable parameters for dft_matrix. But not using fused kernel and Three-pass algorithm is also not implemented.