Open JCBrouwer opened 1 year ago
Yes, this code path does not support sequence lengths longer than 8192 yet
On Tue, Mar 14, 2023 at 7:07 AM Hans Brouwer @.***> wrote:
Hello, thanks for the interesting research and open source repo!
I'm trying to integrate the HyenaOperator (with default settings) in a sequence modeling task and am running into the error in the title when using the fftconv extension.
My sequence (u in the trace below) has the shape (batch=10, channels=32, seq_len=8760) which apparently leads to an fft_size of 32768.
File ".../hyena.py", line 31, in fftconv_fused return fftconv_func(u, k, D, gelu=False, force_fp16_output=torch.is_autocast_enabled()) File ".../extensions/fftconv/fftconv.py", line 175, in fftconv_func return FFTConvFunc.apply( File ".../extensions/fftconv/fftconv.py", line 98, in forward out = fftconv_fwd( RuntimeError: Expected fft_size >= 16 && fft_size <= 16384 && (fft_size == 1 << int(log2(float(fft_size)))) to be true, but got false. (Could this error message be improved? If so, please report an enhancement request to PyTorch.)
Is the maximum supported sequence length 8192? Is this a theoretical / hardware limitation? Or just of the current implementation? Would it be possible to support longer sequences?
Thanks!
— Reply to this email directly, view it on GitHub https://github.com/HazyResearch/safari/issues/6, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABDDIIUSTVVRRO33V473B53W4BGQXANCNFSM6AAAAAAV2JGJDA . You are receiving this because you are subscribed to this thread.Message ID: @.***>
@DanFu09 I also noticed that the fftconv extension here doesn't seem to reach the speed gains as claimed in the paper (it does give memory savings though!)
Can you give more details on the workload you’re using to measure the speedup?
@DanFu09 It's a regular Transformer with self-attention layers replaced with Hyena, with FFTConv. The overall training time per step doesn't seem to decrease when switching between the cuFFT Pytorch implementation and this extension. It might be dominated by other layers. Sequence length ~1K.
Let me know if there are any specific details you're looking for.
Hello, thanks for the interesting research and open source repo!
I'm trying to integrate the HyenaOperator (with default settings) in a sequence modeling task and am running into the error in the title when using the fftconv extension.
My sequence (
u
in the trace below) has the shape (batch=10, channels=32, seq_len=8760) which apparently leads to an fft_size of 32768.Is the maximum supported sequence length 8192? Is this a theoretical / hardware limitation? Or just of the current implementation? Would it be possible to support longer sequences?
Thanks!