optimization of batch renderer

holgk commented 9 years ago

Hi! I don't know if this is the correct section or if I should post this under pull request. So forgive me if it's in the wrong section.

I'm looking the youtube videos therefore I'm a little behind. So delete this comment if it's out of date.

I saw that your batch renderer is calculating vertex postions every submit. This is quite inefficient.

My suggestion is a rectangle class/struct that precalculate all vertecies on the constructor call of a sprite. This works as long as the size of a sprite doesn't change. I was able to achieve about a 8-10% higher fps rate then before.

My example: Rect header (I put it into math):

struct Rect
{
    vec3 topLeftPositon;
    vec3 topRightPositon;
    vec3 bottomLeftPositon;
    vec3 bottomRightPositon;

    enum Alignment
    {
        BottomLeft,
        BottomRight,
        TopLeft,
        TopRight,
        CenteredLeft,
        CenteredRight,
        BottomCentered,
        TopCentered,
        Centered
    };

    Rect(Alignment alignment = Alignment::BottomLeft);
    Rect(float x, float y, float width, float height, Alignment alignment = Alignment::BottomLeft);

    inline const vec3& GetTopLeftPositon() const { return topLeftPositon; }
    inline const vec3& GetTopRightPositon() const { return topRightPositon; }
    inline const vec3& GetBottomLeftPositon() const { return bottomLeftPositon; }
    inline const vec3& GetBottomRightPositon() const { return bottomRightPositon; }
};

Rect cpp implementation:

Rect::Rect(Alignment alignment)
    : Rect(1, 1, 1, 1, alignment)
{

}

Rect::Rect(float x, float y, float width, float height, Alignment alignment)
{
    float halfWidth = 0.5f + width;
    float halfHeight = 0.5f * height;

    switch (alignment)
    {
    case Alignment::BottomLeft:
        topLeftPositon = vec3(x, y + height, 0);
        topRightPositon = vec3(x + width, y + height, 0);
        bottomLeftPositon = vec3(x, y, 0);
        bottomRightPositon = vec3(x + width, y, 0);
        break;
    case Alignment::BottomCentered:
        topLeftPositon = vec3(x - halfWidth, y + height, 0);
        topRightPositon = vec3(x + halfWidth, y + height, 0);
        bottomLeftPositon = vec3(x - halfWidth, y, 0);
        bottomRightPositon = vec3(x + halfWidth, y, 0);
        break;
    case Alignment::BottomRight:
        topLeftPositon = vec3(x - width, y + height, 0);
        topRightPositon = vec3(x, y + height, 0);
        bottomLeftPositon = vec3(x - width, y, 0);
        bottomRightPositon = vec3(x, y, 0);
        break;
    case Alignment::CenteredLeft:
        topLeftPositon = vec3(x, y + halfHeight, 0);
        topRightPositon = vec3(x + width, y + halfHeight, 0);
        bottomLeftPositon = vec3(x, y - halfHeight, 0);
        bottomRightPositon = vec3(x + width, y - halfHeight, 0);
        break;
    case Alignment::Centered:
        topLeftPositon = vec3(x - halfWidth, y + halfHeight, 0);
        topRightPositon = vec3(x + halfWidth, y + halfHeight, 0);
        bottomLeftPositon = vec3(x - halfWidth, y - halfHeight, 0);
        bottomRightPositon = vec3(x + halfWidth, y - halfHeight, 0);
        break;
    case Alignment::CenteredRight:
        topLeftPositon = vec3(x - width, y + halfHeight, 0);
        topRightPositon = vec3(x, y + halfHeight, 0);
        bottomLeftPositon = vec3(x - width, y - halfHeight, 0);
        bottomRightPositon = vec3(x, y - halfHeight, 0);
        break;
    case Alignment::TopLeft:
        topLeftPositon = vec3(x, y, 0);
        topRightPositon = vec3(x + width, y, 0);
        bottomLeftPositon = vec3(x, y - height, 0);
        bottomRightPositon = vec3(x + width, y - height, 0);
        break;
    case Alignment::TopCentered:
        topLeftPositon = vec3(x - halfWidth, y, 0);
        topRightPositon = vec3(x + halfWidth, y, 0);
        bottomLeftPositon = vec3(x - halfWidth, y - height, 0);
        bottomRightPositon = vec3(x + halfWidth, y - height, 0);
        break;
    case Alignment::TopRight:
        topLeftPositon = vec3(x - width, y, 0);
        topRightPositon = vec3(x, y, 0);
        bottomLeftPositon = vec3(x - width, y - height, 0);
        bottomRightPositon = vec3(x, y - height, 0);
        break;
    default:
        topLeftPositon = vec3(x, y + height, 0);
        topRightPositon = vec3(x + width, y + height, 0);
        bottomLeftPositon = vec3(x, y, 0);
        bottomRightPositon = vec3(x + width, y, 0);
        break;
    }
}

A sprite has a rect member and in the constructor this rect is generated. So the batch renderer submit function looks like this (I commented out the old code):

void BatchRenderer2D::Submit(const Renderable2D* renderable)
{
    //const vec3& position = renderable->GetPosition();
    //const vec2& size = renderable->GetSize();
    const vec4& color = renderable->GetColor();

    int r = color.x * 255.0f;
    int g = color.y * 255.0f;
    int b = color.z * 255.0f;
    int a = color.w * 255.0f;
    unsigned int col = a << 24 | b << 16 | g << 8 | r;

    const Rect* rect = ((Sprite*)renderable)->GetRect();

    //buffer->vertex = position;
    buffer->vertex = rect->GetBottomLeftPositon();
    buffer->color = col;
    buffer++;

    //buffer->vertex = vec3(position.x, position.y + size.y, position.z);
    buffer->vertex = rect->GetTopLeftPositon();
    buffer->color = col;
    buffer++;

    //buffer->vertex = vec3(position.x + size.x, position.y + size.y, position.z);
    buffer->vertex = rect->GetTopRightPositon();
    buffer->color = col;
    buffer++;

    //buffer->vertex = vec3(position.x + size.x, position.y, position.z);
    buffer->vertex = rect->GetBottomRightPositon();
    buffer->color = col;
    buffer++;

    indexCount += 6;
}

I hope this is useful for you.

Cyraxx13 commented 9 years ago

Hello!

Since my post is about the BatchRenderer aswell I didn't want to start a new topic.

int r = color.x * 255.0f;
int g = color.y * 255.0f;
int b = color.z * 255.0f;
int a = color.w * 255.0f;

c = a << 24 | b << 16 | g << 8 | r;

Intead of doing this every submit you could just precalculate the integer color in the renderable2d constructor and store it as an unsigned int instead of a vec4.

holgk commented 9 years ago

Hi Cyraxx13!

That's true. Actually all data should be precalculated and cached. And if something changed you only need to recalculate the data that's less cpu intensive than doing the calculation every frame. The only negative side of this is a higher ram usage. But who cares about some bytes more in ram. ;) I think a higher ram consumption isn't a bottle neck any more, because everyone has a lot. (only on mobile divices it could be problematic)

Maxchii commented 9 years ago

Hi Cyraxx13!

Nicely spotted! This eliminates some unnecessary calculations.

TheCherno commented 9 years ago

Due to the effects of CPU caching, adding an additional vertex cache to each Renderable would actually slow things down. We have quite a large number of sprites, and if each of them had an additional 52 bytes of memory (your Rectangle class), more memory reads would be necessary to provide the buffer with our data.

Accessing memory that isn't CPU cached has a high latency (~200 cycles), MUCH higher than the simple additions (and stack pushes) that we need to do to calculate the vertex positions and send them to the buffer (those are like ~5 cycles each). Therefore you'll find that caching vertex positions will actually be significantly slower than just recalculating them.

As for your "8-10% faster", that's probably because you were running Sparky in debug rather than release mode. Please make sure you run Sparky in release mode when analyzing performance. :)

TheCherno commented 9 years ago

Now the color calculation, on the other hand, is a valid concern. That should really be a separate topic. I'll address this in the upcoming maintenance episode.

Cyraxx13 commented 9 years ago

I'm not sure what your plan is but I think most of the time we'll render groups with only the identity matrix in the transformation stack unless we render texts and UI and such.

If that is the case it might be better to calculate the vertex position like this: m_Buffer->vertex = transStack.size == 1 ? vec3(x,y,z) : transBack * vec3(x,y,z);

In this case you could avoid a lot of unnecessary operations, but of course this depends on how many things you want to render with the identity matrix only.

holgk commented 9 years ago

Hi TheCherno you are right I didn't thought about CPU caching and the limited size of CPU cache. I tested it in release and debug. But if i think about it, I tested it with only 6500-7000 sprites and my renderering started directly after allocation therefore i think most of my data was still in cpu cache and therefore fast accessible. But now we added so much other stuff. I was wondering why your preformance was better than mine. Know I know. Thanks for the help.

TheCherno commented 9 years ago

Color caching added in Ep. 20.

TheCherno / Sparky

optimization of batch renderer #10