ermig1979 / Simd

C++ image processing and machine learning library with using of SIMD: SSE, AVX, AVX-512, AMX for x86/x64, VMX(Altivec) and VSX(Power7) for PowerPC, NEON for ARM.
http://ermig1979.github.io/Simd
MIT License
2.03k stars 406 forks source link

AlphaBlend Performance issues #203

Closed HowToExpect closed 2 years ago

HowToExpect commented 2 years ago
inline long long getCurrentTimeMicro()
{
    return std::chrono::duration_cast<std::chrono::microseconds>(std::chrono::steady_clock::now().time_since_epoch()).count();
}

static void InnerMixtureImage(IplImage* pSrc, IplImage* pDst, int xpos, int ypos)
{
    if (!pSrc || !pDst)
        return;

    if (pSrc->nChannels != 4 || pDst->nChannels != 4)
        return;

    int w = xpos + pSrc->width;
    int h = ypos + pSrc->height;

    if (w > pDst->width || h > pDst->height)
    {
        printf("<WARNING> %s: src width = %d, height = %d, dst width = %d, height = %d, pos x = %d, y = %d\r\n",
            __FUNCTION__, pSrc->width, pSrc->height, pDst->width, pDst->height, xpos, ypos);
        return;
    }

    int i, j;
    for (j = 0; j < pSrc->height; ++j)
    {

        unsigned char* pucDst = (unsigned char*)pDst->imageData + (j + ypos) * pDst->widthStep + xpos * pDst->nChannels;
        unsigned char* pucSrc = (unsigned char*)pSrc->imageData + j * pSrc->widthStep;

        for (i = 0; i < pSrc->width; ++i)
        {
            unsigned char alpha = pucSrc[3];

            if (alpha == 0)
            {
            }
            else if (alpha == 255)
            {
                pucDst[0] = pucSrc[0];
                pucDst[1] = pucSrc[1];
                pucDst[2] = pucSrc[2];
                pucDst[3] = pucSrc[3];
            }
            else 
            {
                pucDst[0] = (pucDst[0] * (255 - alpha) + pucSrc[0] * alpha) >> 8;
                pucDst[1] = (pucDst[1] * (255 - alpha) + pucSrc[1] * alpha) >> 8;
                pucDst[2] = (pucDst[2] * (255 - alpha) + pucSrc[2] * alpha) >> 8;
                pucDst[3] = pucDst[3] > alpha ? pucDst[3] : alpha;
            }
            pucDst += 4;
            pucSrc += 4;
        }
    }

}

int main()
{
    IplImage* pUpdate = cvLoadImage("update.png", CV_LOAD_IMAGE_UNCHANGED);

    int width = pUpdate->width;
    int height = pUpdate->height;

    IplImage* pImg = cvCreateImage(cvSize(width, height), 8, 4);
    cvSet(pImg, cvScalar(51, 51, 51, 255));

    IplImage* pChannel1 = cvCreateImage(cvGetSize(pUpdate), 8, 1);
    IplImage* pChannel2 = cvCreateImage(cvGetSize(pUpdate), 8, 1);
    IplImage* pChannel3 = cvCreateImage(cvGetSize(pUpdate), 8, 1);
    IplImage* pChannel4 = cvCreateImage(cvGetSize(pUpdate), 8, 1);

    cvSplit(pUpdate, pChannel1, pChannel2, pChannel3, pChannel4);

    int64_t start = getCurrentTimeMicro();

    Avx2::AlphaBlending((const uint8_t*)pUpdate->imageData, pUpdate->widthStep, pUpdate->width, pUpdate->height, pUpdate->nChannels,
                      (const uint8_t*)pChannel4->imageData, pChannel4->widthStep, (uint8_t*)pImg->imageData, pImg->widthStep);

    //InnerMixtureImage(pUpdate, pImg, 0, 0);
    int64_t end = getCurrentTimeMicro();

    cout << "elapsed time = " << (end - start) << " us " << endl;
}
HowToExpect commented 2 years ago

update

HowToExpect commented 2 years ago

use Avx2::AlphaBlending elapsed is 5000us, usr InnerMixtureImage elapsed is 3000us

ermig1979 commented 2 years ago

Hello!

1) I would not recommend to call function Avx2::AlphaBlending direct. This is not part of Simd Library API. You have to use function SimdAlphaBlending. 2) You can test performance of this function with using of 'Test' application. For example:

./Test -fi=AlphaBlending -w=1920 -h=1080 -fe=Uniform

The result shows that using of AVX2 gives significant performance gain compare to Base code.

--------------------------------------------------------------------------------------------
| Function         |   API  Base  Sse2 Sse41  Avx2 | Bs/S2 Bs/S4 Bs/A2 | Bs/S2 S2/S4 S4/A2 |
--------------------------------------------------------------------------------------------
| Common, ms       | 0.522 3.205 1.084 0.721 0.522 |  2.96  4.45  6.14 |  2.96  1.50  1.38 |
--------------------------------------------------------------------------------------------
| AlphaBlending[1] | 0.215 1.338 0.305 0.299 0.214 |  4.38  4.47  6.27 |  4.38  1.02  1.40 |
| AlphaBlending[2] | 0.476 2.730 0.647 0.660 0.477 |  4.22  4.14  5.72 |  4.22  0.98  1.38 |
| AlphaBlending[3] | 0.704 5.091 4.984 0.971 0.718 |  1.02  5.24  7.09 |  1.02  5.13  1.35 |
| AlphaBlending[4] | 1.029 5.677 1.403 1.408 1.019 |  4.05  4.03  5.57 |  4.05  1.00  1.38 |
--------------------------------------------------------------------------------------------