Question:softmax impl to support quantized data type

xiaotongnii commented 1 year ago

armnn22.02 operator list softmax for CpuRef and Cpu Acc, softmax can support QASYMMS8\QASYMMU8 data types.

the follow softmax impl, use float to compute output, how to deal with QASYMMU8 data types?

//softmax impl in the src\backends\reference\workloads\Softmax.cpp

/// Computes the softmax function on some inputs, into outputs, with a shape given by tensorInfo.
void Softmax(Decoder<float>& in, Encoder<float>& out, const TensorInfo& inputTensorInfo, float beta, int axis)
{
    ARMNN_ASSERT_MSG(axis < static_cast<int>(inputTensorInfo.GetNumDimensions()),
                     "Required axis index greater than number of dimensions.");
    ARMNN_ASSERT_MSG(axis >= -static_cast<int>(inputTensorInfo.GetNumDimensions()),
                     "Required axis index lower than negative of the number of dimensions");

    unsigned int uAxis = axis < 0  ?
                         inputTensorInfo.GetNumDimensions() - static_cast<unsigned int>(abs(axis))
                         : static_cast<unsigned int>(axis);

    const TensorShape& inputShape = inputTensorInfo.GetShape();
    const unsigned int outerSize  = armnnUtils::GetNumElementsBetween(inputShape, 0, uAxis);
    const unsigned int axisSize   = inputShape[uAxis];
    const unsigned int innerSize  = armnnUtils::GetNumElementsBetween(inputShape,
                                                                      uAxis + 1,
                                                                      inputShape.GetNumDimensions());

    for (unsigned int outer = 0; outer < outerSize; ++outer)
    {
        unsigned int inputBeginIdx  = outer * axisSize * innerSize;
        unsigned int inputEndIdx    = inputBeginIdx + axisSize * innerSize;
        unsigned int outputBeginIdx = outer * axisSize * innerSize;

        for (unsigned int inner = 0; inner < innerSize; ++inner, ++inputBeginIdx, ++inputEndIdx, ++outputBeginIdx)
        {
            // Find max
            float maxValue = std::numeric_limits<float>::lowest();
            for (unsigned int iter = inputBeginIdx; iter < inputEndIdx; iter += innerSize)
            {
                in[iter];
                maxValue = std::max(maxValue, in.Get());
            }

            // Compute sum
            float sum = 0.0f;
            for (unsigned int iter = inputBeginIdx; iter < inputEndIdx; iter += innerSize)
            {
                in[iter];
                sum += std::exp((in.Get() - maxValue) * beta);
            }

            // Compute result
            unsigned int outputIter = outputBeginIdx;
            out[outputIter];
            for (unsigned int iter = inputBeginIdx; iter < inputEndIdx; iter += innerSize, outputIter += innerSize)
            {
                out[outputIter];
                in[iter];
                out.Set(std::exp((in.Get() - maxValue) * beta) / sum);
            }
        }
    }
}

MikeJKelly commented 1 year ago

Hi @Shelton-N

That's the code for the Reference Implementation which performs all its calculations in float space. The implementation here uses a Decoder for the input and an Encoder for the output. The Decoder will convert any DataType to float and the Encoder will covert from a float to any other DataType. The performance isn't great but the reference is not intended for any other purpose than testing.

For CpuAcc we use NeonSoftmaxWorkload which uses the NESoftmaxLayer from ComputeLibrary which is far better performance.

Best regards, Mike

xiaotongnii commented 1 year ago

@MikeJKelly ,thanks,I will check CpuAcc for softmaxworkload impl.

ARM-software / armnn

Question:softmax impl to support quantized data type #673