Replaying glGetTexImage crashes: wrong buffer size

stgatilov commented 1 year ago

With this repro code (full code: bug_repro.zip):

    static const int W = 1024;
    static const int H = 1024;

    vector<uint8_t> source(W * H * 4);
    // ... fill data

    GLuint texId;
    glGenTextures(1, &texId);
    glBindTexture(GL_TEXTURE_2D, texId);
    glPixelStorei(GL_PACK_ALIGNMENT, 1);
    glTexImage2D(GL_TEXTURE_2D, 0, GL_RGBA8, W, H, 0, GL_RGBA, GL_UNSIGNED_BYTE, source.data());
    glBindTexture(GL_TEXTURE_2D, 0);

    vector<uint8_t> readback(W * H * 4);
    glBindTexture(GL_TEXTURE_2D, texId);
    glGetTexImage(GL_TEXTURE_2D, 0, GL_RGBA, GL_UNSIGNED_BYTE, readback.data());
    glBindTexture(GL_TEXTURE_2D, 0);

I record a trace, then try to replay it. On some GL implementations, it replays normally, but on NVIDIA it crashes inside glGetTexImage.

Here is the place which causes the problem (retrace_glGetTexImage):

     std::vector<char> buffer;
    if (!_pack_buffer) {
     GLint max_tex_size;
     glGetIntegerv(GL_MAX_TEXTURE_SIZE, &max_tex_size);
     buffer.resize(max_tex_size * max_tex_size * max_tex_size);
    }
    pixels = buffer.data();
    if (currentContext && !currentContext->insideList && !currentContext->insideBeginEnd && retrace::profiling) {
        glretrace::beginProfile(call, false);
    }
    glGetTexImage(target, level, format, type, pixels);
    if (currentContext && !currentContext->insideList && !currentContext->insideBeginEnd && retrace::profiling) {
        glretrace::endProfile(call, false);
    }

Here max_tex_size is usually some power-of-two greater than 2K (e.g. 16K on AMD and 32K on NVIDIA), so a cube of it overflows GLint and becomes zero. Hence, the buffer is resized to zero size. Then we pass its pointer (which is most likely NULL, but not necessarily) to glGetTexImage. Drivers are not required to check for this case, and apparently NVIDIA implementation does not check.

So the question is: how this should be fixed?

Simple ignore this call.
Set a sane upper limit on buffer size, e.g. it is min(S^3, 1GB).
Request proper maximum size from driver (3D textures have stricter limits than GL_MAX_TEXTURE_SIZE) and allocate space for it (I guess that would be dozens of gigabytes).
Use GL introspection to learn about the sizes of the texture and allocate memory for it.
Maintain maximum number of pixels in any texture storage specified so far. Then allocate buffer for this size.

jrfonseca commented 1 year ago

Thanks for the detailed diagnostic.

The ideal would be option 4, but it would be a fair amount of work, especially as one needs to be careful about supported GL version/extensions to get it right.

As short term gap I suggest option 3 plus 2 as fallback. That is:

use GL_MAX_3D_TEXTURE_SIZE/GL_MAX_CUBE_MAP_TEXTURE_SIZE/GL_MAX_ARRAY_TEXTURE_LAYERS depending on the target value; and
use uint64_t arithmetic and clamp the result to something reasonable (e.g, 1GB on 32bits processes 4GB on 64bits.)

Skipping the call would make performance profiling results severely biased, therefore best to avoid.

Wanna post a PR?

stgatilov commented 1 year ago

Yes, I think I can do a PR with p.3 + p.2. Note that at least on NVIDIA, it will effectively be p.2.

I must admit I never used profiling in apitrace. Won't zero-filling 4GB of RAM on every glGetTexImage call make profiling results crazy anyway?

Would it make sense to cache the buffer into global variable and not reallocate/refill it on second/third/etc. calls? It can greatly accelerate replaying for rare cases when it is called too often.

As for p.4, webpage says:

To determine the required size of pixels, use glGetTexLevelParameter to determine the dimensions of the internal texture image, then scale the required number of pixels by the storage required for each pixel, based on format and type. Be sure to take the pixel storage parameters into account, especially GL_PACK_ALIGNMENT.

So:

Determine dimensions of texture, taking into account all the types (1D, 3D, array, cubemap). Luckily, glGetTexLevelParameter can return dimensions since GL 1.1, although one must not e.g. ask for depth of a 2D texture.
Determine bytes per pixel from format and type. Maybe just take maximum ever possible? I think the largest is 16 bytes for GL_RGBA + GL_FLOAT. Maybe set to 32 for doubles, although I'm not sure it is possible to query doubles even with extensions.
Handle PACK padding/alignment (reference). There are a lot of them unfortunately =( They can specify arbitrary offset, stride, and alignment. I don't know how to handle this properly. This issue affects p.5 of my original proposal too.

Looks like a lot of trouble for a feature which is bad for performance anyway.

stgatilov commented 1 year ago

I created two different PRs:

https://github.com/apitrace/apitrace/pull/858 --- trying to honestly implement p.3 + p.2, i.e. query all the max sizes. Slow and messy.
https://github.com/apitrace/apitrace/pull/857 --- simple p.2 without any queries. Just allocate 1 or 4 GB buffer and use it. Also cache it forever to make repeated calls faster.

To be honest, I like the second PR more. Even if it is conceptually dirtier, it is simpler, more reliable (what if driver returns wrong value for MAX_XXX?) and faster.

apitrace / apitrace

Replaying glGetTexImage crashes: wrong buffer size #853