RuntimeError: Expected fft_size >= 16 && fft_size <= 16384

JCBrouwer commented 1 year ago

Hello, thanks for the interesting research and open source repo!

I'm trying to integrate the HyenaOperator (with default settings) in a sequence modeling task and am running into the error in the title when using the fftconv extension.

My sequence (u in the trace below) has the shape (batch=10, channels=32, seq_len=8760) which apparently leads to an fft_size of 32768.

  File ".../hyena.py", line 31, in fftconv_fused
    return fftconv_func(u, k, D, gelu=False, force_fp16_output=torch.is_autocast_enabled())
  File ".../extensions/fftconv/fftconv.py", line 175, in fftconv_func
    return FFTConvFunc.apply(
  File ".../extensions/fftconv/fftconv.py", line 98, in forward
    out = fftconv_fwd(
RuntimeError: Expected fft_size >= 16 && fft_size <= 16384 && (fft_size == 1 << int(log2(float(fft_size)))) to be true, but got false.  (Could this error message be improved?  If so, please report an enhancement request to PyTorch.)

Is the maximum supported sequence length 8192? Is this a theoretical / hardware limitation? Or just of the current implementation? Would it be possible to support longer sequences?

Thanks!

DanFu09 commented 1 year ago

Yes, this code path does not support sequence lengths longer than 8192 yet

will update this issue when the code is updated. For now, the slow version of fftconv (without extension) should support longer sequences up to GPU memory limits.

On Tue, Mar 14, 2023 at 7:07 AM Hans Brouwer @.***> wrote:

Hello, thanks for the interesting research and open source repo!

I'm trying to integrate the HyenaOperator (with default settings) in a sequence modeling task and am running into the error in the title when using the fftconv extension.

My sequence (u in the trace below) has the shape (batch=10, channels=32, seq_len=8760) which apparently leads to an fft_size of 32768.

File ".../hyena.py", line 31, in fftconv_fused return fftconv_func(u, k, D, gelu=False, force_fp16_output=torch.is_autocast_enabled()) File ".../extensions/fftconv/fftconv.py", line 175, in fftconv_func return FFTConvFunc.apply( File ".../extensions/fftconv/fftconv.py", line 98, in forward out = fftconv_fwd( RuntimeError: Expected fft_size >= 16 && fft_size <= 16384 && (fft_size == 1 << int(log2(float(fft_size)))) to be true, but got false. (Could this error message be improved? If so, please report an enhancement request to PyTorch.)

Is the maximum supported sequence length 8192? Is this a theoretical / hardware limitation? Or just of the current implementation? Would it be possible to support longer sequences?

Thanks!

— Reply to this email directly, view it on GitHub https://github.com/HazyResearch/safari/issues/6, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABDDIIUSTVVRRO33V473B53W4BGQXANCNFSM6AAAAAAV2JGJDA . You are receiving this because you are subscribed to this thread.Message ID: @.***>

calclavia commented 1 year ago

@DanFu09 I also noticed that the fftconv extension here doesn't seem to reach the speed gains as claimed in the paper (it does give memory savings though!)

DanFu09 commented 1 year ago

Can you give more details on the workload you’re using to measure the speedup?

calclavia commented 1 year ago

@DanFu09 It's a regular Transformer with self-attention layers replaced with Hyena, with FFTConv. The overall training time per step doesn't seem to decrease when switching between the cuFFT Pytorch implementation and this extension. It might be dominated by other layers. Sequence length ~1K.

Let me know if there are any specific details you're looking for.

HazyResearch / safari

RuntimeError: Expected fft_size >= 16 && fft_size <= 16384 #6