Closed Zkx199800 closed 5 months ago
Hi, torch.nn.TransformerEncoderLayer has a input parameter called 'batch_first', which is set to False by default. You can check the official doc about torch.nn.TransformerEncoderLayer on https://pytorch.org/docs/stable/generated/torch.nn.TransformerEncoderLayer.html#transformerencoderlayer for more details.
Original code in this project is : """ Implementation for inter-sample self-attention input size for the encoder_layers: [batch, h x w, dim] """ """ Implementation for intra-sample self-attention input size for the encoder_layers: [h x w, batch, dim] """ I think that the input size of the inter-sample self-attention for the encoder_layers is [h x w, batch, dim], and intra-sample self-attention is [batch, h x w, dim]. I'm very confused about this question, please help me answer it.