Open Abhranta opened 2 months ago
Basically, for the nn.Hardswish, the consumption on the hardware part is similar as integer-Gelu. However, the accuracy is better than the integer-Gelu. So at this time, we only release the Hardswish version, which is better than Gelu version.
The default activation used here is nn.Hardswish. I cannot find any mentions of the GeLU activation function and the quantized integer implementation of the same mention in the paper. Am I missing something here??