custom c++ cuda kernel for the modified self-attention

Digital-Defiance / nlp-metaformer

An ablation study on the transformer network for Natural Language Processing

3 stars 0 forks source link

Open RuiFilipeCampos opened 5 months ago

RuiFilipeCampos commented 5 months ago

this is actually critical to get the performance out of it, without a custom implementation the layer is actually slower than the scaled dot product