apitrace / apitrace

Tools for tracing OpenGL, Direct3D, and other graphics APIs
https://apitrace.github.io/
MIT License
2.62k stars 490 forks source link

Replaying glGetTexImage crashes: wrong buffer size #853

Open stgatilov opened 1 year ago

stgatilov commented 1 year ago

With this repro code (full code: bug_repro.zip):

    static const int W = 1024;
    static const int H = 1024;

    vector<uint8_t> source(W * H * 4);
    // ... fill data

    GLuint texId;
    glGenTextures(1, &texId);
    glBindTexture(GL_TEXTURE_2D, texId);
    glPixelStorei(GL_PACK_ALIGNMENT, 1);
    glTexImage2D(GL_TEXTURE_2D, 0, GL_RGBA8, W, H, 0, GL_RGBA, GL_UNSIGNED_BYTE, source.data());
    glBindTexture(GL_TEXTURE_2D, 0);

    vector<uint8_t> readback(W * H * 4);
    glBindTexture(GL_TEXTURE_2D, texId);
    glGetTexImage(GL_TEXTURE_2D, 0, GL_RGBA, GL_UNSIGNED_BYTE, readback.data());
    glBindTexture(GL_TEXTURE_2D, 0);

I record a trace, then try to replay it. On some GL implementations, it replays normally, but on NVIDIA it crashes inside glGetTexImage.

Here is the place which causes the problem (retrace_glGetTexImage):

     std::vector<char> buffer;
    if (!_pack_buffer) {
     GLint max_tex_size;
     glGetIntegerv(GL_MAX_TEXTURE_SIZE, &max_tex_size);
     buffer.resize(max_tex_size * max_tex_size * max_tex_size);
    }
    pixels = buffer.data();
    if (currentContext && !currentContext->insideList && !currentContext->insideBeginEnd && retrace::profiling) {
        glretrace::beginProfile(call, false);
    }
    glGetTexImage(target, level, format, type, pixels);
    if (currentContext && !currentContext->insideList && !currentContext->insideBeginEnd && retrace::profiling) {
        glretrace::endProfile(call, false);
    }

Here max_tex_size is usually some power-of-two greater than 2K (e.g. 16K on AMD and 32K on NVIDIA), so a cube of it overflows GLint and becomes zero. Hence, the buffer is resized to zero size. Then we pass its pointer (which is most likely NULL, but not necessarily) to glGetTexImage. Drivers are not required to check for this case, and apparently NVIDIA implementation does not check.


So the question is: how this should be fixed?

  1. Simple ignore this call.
  2. Set a sane upper limit on buffer size, e.g. it is min(S^3, 1GB).
  3. Request proper maximum size from driver (3D textures have stricter limits than GL_MAX_TEXTURE_SIZE) and allocate space for it (I guess that would be dozens of gigabytes).
  4. Use GL introspection to learn about the sizes of the texture and allocate memory for it.
  5. Maintain maximum number of pixels in any texture storage specified so far. Then allocate buffer for this size.
jrfonseca commented 1 year ago

Thanks for the detailed diagnostic.

The ideal would be option 4, but it would be a fair amount of work, especially as one needs to be careful about supported GL version/extensions to get it right.

As short term gap I suggest option 3 plus 2 as fallback. That is:

Skipping the call would make performance profiling results severely biased, therefore best to avoid.

Wanna post a PR?

stgatilov commented 1 year ago

Yes, I think I can do a PR with p.3 + p.2. Note that at least on NVIDIA, it will effectively be p.2.

I must admit I never used profiling in apitrace. Won't zero-filling 4GB of RAM on every glGetTexImage call make profiling results crazy anyway?

Would it make sense to cache the buffer into global variable and not reallocate/refill it on second/third/etc. calls? It can greatly accelerate replaying for rare cases when it is called too often.


As for p.4, webpage says:

To determine the required size of pixels, use glGetTexLevelParameter to determine the dimensions of the internal texture image, then scale the required number of pixels by the storage required for each pixel, based on format and type. Be sure to take the pixel storage parameters into account, especially GL_PACK_ALIGNMENT.

So:

Looks like a lot of trouble for a feature which is bad for performance anyway.

stgatilov commented 1 year ago

I created two different PRs:

To be honest, I like the second PR more. Even if it is conceptually dirtier, it is simpler, more reliable (what if driver returns wrong value for MAX_XXX?) and faster.