akaashdash / xlstm

MIT License
22 stars 0 forks source link

Convolution question #2

Open egormalyutin opened 1 month ago

egormalyutin commented 1 month ago

Hello! Thank you for your implementation. However, I have a little question regarding the use of convolution.

        x_norm = self.layer_norm(x)
        x_conv = F.silu(self.causal_conv(x_norm.unsqueeze(1)).squeeze(1))

Does it mean that in your implementation, you are not actually doing convolution over sequence? After reading the paper, I was left with an impression that you should project 4 latest inputs and then apply swish to obtain i and f. Sorry if I'm wrong as I don't really use PyTorch.

akaashdash commented 1 month ago

From my understanding, we convolute over the entire sequence and not just the last 4 inputs. I'm sure there are many different ways to interpret the paper so I am open to discussing alternatives and trying them out.