Open stgatilov opened 1 year ago
Thanks for the detailed diagnostic.
The ideal would be option 4, but it would be a fair amount of work, especially as one needs to be careful about supported GL version/extensions to get it right.
As short term gap I suggest option 3 plus 2 as fallback. That is:
GL_MAX_3D_TEXTURE_SIZE
/GL_MAX_CUBE_MAP_TEXTURE_SIZE
/GL_MAX_ARRAY_TEXTURE_LAYERS
depending on the target value; anduint64_t
arithmetic and clamp the result to something reasonable (e.g, 1GB on 32bits processes 4GB on 64bits.)Skipping the call would make performance profiling results severely biased, therefore best to avoid.
Wanna post a PR?
Yes, I think I can do a PR with p.3 + p.2. Note that at least on NVIDIA, it will effectively be p.2.
I must admit I never used profiling in apitrace. Won't zero-filling 4GB of RAM on every glGetTexImage call make profiling results crazy anyway?
Would it make sense to cache the buffer into global variable and not reallocate/refill it on second/third/etc. calls? It can greatly accelerate replaying for rare cases when it is called too often.
As for p.4, webpage says:
To determine the required size of pixels, use glGetTexLevelParameter to determine the dimensions of the internal texture image, then scale the required number of pixels by the storage required for each pixel, based on format and type. Be sure to take the pixel storage parameters into account, especially GL_PACK_ALIGNMENT.
So:
format
and type
. Maybe just take maximum ever possible? I think the largest is 16 bytes for GL_RGBA + GL_FLOAT. Maybe set to 32 for doubles, although I'm not sure it is possible to query doubles even with extensions.Looks like a lot of trouble for a feature which is bad for performance anyway.
I created two different PRs:
To be honest, I like the second PR more. Even if it is conceptually dirtier, it is simpler, more reliable (what if driver returns wrong value for MAX_XXX?) and faster.
With this repro code (full code: bug_repro.zip):
I record a trace, then try to replay it. On some GL implementations, it replays normally, but on NVIDIA it crashes inside glGetTexImage.
Here is the place which causes the problem (
retrace_glGetTexImage
):Here max_tex_size is usually some power-of-two greater than 2K (e.g. 16K on AMD and 32K on NVIDIA), so a cube of it overflows GLint and becomes zero. Hence, the buffer is resized to zero size. Then we pass its pointer (which is most likely NULL, but not necessarily) to glGetTexImage. Drivers are not required to check for this case, and apparently NVIDIA implementation does not check.
So the question is: how this should be fixed?
min(S^3, 1GB)
.