Open Abhranta opened 4 weeks ago
For LayerNorm, in the end, we don't choose the integer version. Based on later tests, the integer version can occupy more hardware resources, which means it is not an efficient choice. So, in the end, we just overlap this operator to hide its execution time.
How does the
LayerNormActQ
class actually perform normalization? Moreover, where is it used in a model ?All the models use the following for layer norm :
norm_layer=partial(nn.LayerNorm, eps=1e-6)