Closed kevindoran closed 1 month ago
hi, let me have a look
Hi,
i have add a feedforward layer into the Encoder block
self.feed_forward = nn.Sequential(
nn.Linear(self.d_model, self.d_model * 2),
nn.ReLU(),
nn.Linear(self.d_model * 2, self.d_model)
)
self.stack_layers = nn.ModuleList(
[EncoderLayer(
self.d_model,
MultiHeadAttention(self.n_head, self.d_model, self.d_model, self.dropout,
output_linear=False),
use_residual=False,
feed_forward=self.feed_forward,
dropout=self.dropout
) for _ in range(self.n_layers)])
I see the MLP present by default now:
THP(
(layer_type_emb): Embedding(1, 512, padding_idx=0)
(layer_temporal_encoding): TimePositionalEncoding()
(layer_intensity_hidden): Linear(in_features=512, out_features=1, bias=True)
(softplus): Softplus(beta=1.0, threshold=20.0)
(feed_forward): Sequential(
(0): Linear(in_features=512, out_features=1024, bias=True)
(1): ReLU()
(2): Linear(in_features=1024, out_features=512, bias=True)
)
(stack_layers): ModuleList(
(0-3): 4 x EncoderLayer(
(self_attn): MultiHeadAttention(
(linears): ModuleList(
(0-2): 3 x Linear(in_features=512, out_features=512, bias=True)
)
(dropout): Dropout(p=0.1, inplace=False)
)
(feed_forward): Sequential(
(0): Linear(in_features=512, out_features=1024, bias=True)
(1): ReLU()
(2): Linear(in_features=1024, out_features=512, bias=True)
)
)
)
)
By the way, if you want to be able to reproduce the paper results, it is noteworthy that the THP paper had the inner dimension as being configurable: this is clearest by looking in the Supplemental where there are 3 parameter sets, and if I understand correctly, the inner dimension is not a hard-coded multiple of the model dimension. If the repo doesn't intend to reproduce the paper design, it's probably worth making a note somewhere in the documentation about this.
Hi,
We are trying to reproduce the design and welcome issues to help us achieve it. We will try to make the configuration more flexible to accommodate various scenarios.
The shared
EncoderLayer
used by a few models, although I've only looked atTorchTHP
, has ause_residual
flag that defaults toFalse
, and I don't think it is set to True on installation ofTorchTHP
, which should have both attention and MLP in each transformer layer.When
False
theuse_residual
flag causes the feed-forward block to be skipped (torch_baselayer.py#L85).A little test:
Outputs:
As can be seen, the
EncoderLayer
containsself_attn
and nothing else.