Closed M-Quadra closed 2 weeks ago
And this is the implementation with MetalPerformanceShadersGraph
. CoreML may fallback to CPU-only inference at some models such as VITS. It is necessary to use the MPS implementation for GPU inference.
import MetalPerformanceShadersGraph
func softplus(graph: MPSGraph, x: MPSGraphTensor, beta: Double = 1, threshold: Double = 20) -> MPSGraphTensor {
let b = graph.constant(beta, dataType: x.dataType)
let t = graph.constant(threshold, dataType: x.dataType)
assert(x.dataType != .int32)
let xb = graph.multiplication(x, b, name: nil)
let m = graph.lessThanOrEqualTo(xb, t, name: nil)
let m0 = graph.cast(m, to: x.dataType, name: nil)
let xb0 = graph.multiplication(xb, m0, name: nil)
let exp = graph.exponent(with: xb0, name: nil)
let exp0 = graph.addition(exp, m0, name: nil)
let log = graph.logarithm(with: exp0, name: nil)
let log0 = graph.division(log, b, name: nil)
let m1 = graph.logicalNOR(m, m, name: nil)
let x0 = graph.multiplication(
x, graph.cast(m1, to: x.dataType, name: nil),
name: nil
)
let y = graph.addition(log0, x0, name: nil)
return y
}
If $\beta * x_i <= threshold$, then $mask^0_i=1$, $mask^1_i=0$
$$ \begin{align} softplus(x_i) & = \frac{1}{\beta} \log(mask^0_i + \exp(\beta x_i mask^0_i)) + x_i mask^1_i \ & = \frac{1}{\beta}log(1 + \exp(\beta x_i)) + 0 \ & = \frac{1}{\beta}\log(1 + \exp(\beta x_i)) \end{align*} $$
If $\beta * x_i > threshold$, then $mask^0_i=0$, $mask^1_i=1$
$$ \begin{align} softplus(x_i) & = \frac{1}{\beta} \log(mask^0_i + \exp(\beta x_i mask^0_i)) + x_i mask^1_i \ & = \frac{1}{\beta} log(0 + \exp(0)) + x_i \ & = \frac{1}{\beta} log(1) + x_i \ & = 0 + x_i \ & = x_i \end{align*} $$
Thanks for the contribution @M-Quadra
In the previous implementation,
x
was required to be at least rank 3. According to the PyTorch documentation, the input tosoftplus
is not dependent on rank. This PR also adds the implementation of thethreshold
parameter in certain cases.