DeepTrackAI / deeplay

Other
4 stars 5 forks source link

Add LayerDropoutSkipNormalization to Transformer models #82

Closed JesusPinedaC closed 6 months ago

JesusPinedaC commented 7 months ago

This pull request introduces the LayerDropoutSkipNormalization block to Transformer models. This block is flexible and allows for easy changes to the order of its components. It also supports both Tensor inputs and dictionaries.

BenjaminMidtvedt commented 7 months ago

@JesusPinedaC Is there an activation in there too?

JesusPinedaC commented 7 months ago

Not explicitly.

This block is used to define the two transformer encoder layer sub-modules.

The first comprises multihead attention (the layer, in this case) followed by dropout, skip and normalization. No activation at this point in any of the operations.

The second module is the feedforward. Here (as defined in the original paper) Layer is an MLP which consists of two dense layers with a Relu activation in between. Then dropout, skip and normalization are carried out in the same way as in the attention module.

In short, layer is where the core processing is carried out including non-linearities.

Should we call "layer" differently to be clearer?

BenjaminMidtvedt commented 7 months ago

Ok! Makes sense. We might want to differentiate the naming, but i'm not sure what we would call it instead. @giovannivolpe any inputs? If we call slots for learnable modules like Conv2d "layer", then should we have another name for general non-linearities that can be layer+activation.

BenjaminMidtvedt commented 7 months ago

Actually, classically, a layer would be the learnable + non-linearity not just the learnable. So maybe we should instead rename single learnable modules?

giovannivolpe commented 6 months ago

@JesusPinedaC @BenjaminMidtvedt I think it's reasonable to call layer a structure with learnable + activation. I think we can also keep single learnable modules called layer too.

BenjaminMidtvedt commented 6 months ago

Ok, then good to merge to me