In the paper Algorithm 3, for hyena order N, there are (N+1) projections, and N filters
with order=2, it returns
mlp2(x) * FFTConv(mlp1(x) * FFTConv(mlp0(x), filter0), filter1)
For hyena order N, there are (N+1) projections and (N-1) filters
In the code, for example, with order=2,
it will do mlp2(x) * FFTConv(mlp0(x) * mlp1(x), filter0)
i.e., for order=N there is only (N-1) FFTConv applications.
is it intentional or am I missing something (the code is quite convoluted) ?
A lot of the experiment had done with order=2. Does that mean one application of FFTConv per layer is enough ?
In the paper Algorithm 3, for hyena order N, there are (N+1) projections, and N filters with order=2, it returns
mlp2(x) * FFTConv(mlp1(x) * FFTConv(mlp0(x), filter0), filter1)
However, in the implementation e.g. https://github.com/HazyResearch/safari/blob/4f5972cee773650d311cc454d69862f1897954f0/standalone_hyena.py#L244
For hyena order N, there are (N+1) projections and (N-1) filters In the code, for example, with order=2, it will do
mlp2(x) * FFTConv(mlp0(x) * mlp1(x), filter0)
i.e., for order=N there is only (N-1) FFTConv applications.
is it intentional or am I missing something (the code is quite convoluted) ?
A lot of the experiment had done with order=2. Does that mean one application of FFTConv per layer is enough ?