Open flowerfx opened 8 years ago
I've done clipping node performance testing in the past, and I believe the performance issues you are seeing are due to the fact we are reading values from the GPU during clipping node's draw functions.
If you examine CCStencilStateManager.cpp::140, we see
glGetIntegerv(GL_STENCIL_WRITEMASK, (GLint *)&_currentStencilWriteMask);
glGetIntegerv(GL_STENCIL_FUNC, (GLint *)&_currentStencilFunc);
glGetIntegerv(GL_STENCIL_REF, &_currentStencilRef);
glGetIntegerv(GL_STENCIL_VALUE_MASK, (GLint *)&_currentStencilValueMask);
glGetIntegerv(GL_STENCIL_FAIL, (GLint *)&_currentStencilFail);
glGetIntegerv(GL_STENCIL_PASS_DEPTH_FAIL, (GLint *)&_currentStencilPassDepthFail);
glGetIntegerv(GL_STENCIL_PASS_DEPTH_PASS, (GLint *)&_currentStencilPassDepthPass);
glGetIntergerv has extremely negative performance implications. It is listed as a common mistake on https://www.opengl.org/wiki/Common_Mistakes in OpenGL programming. It says:
You find that these functions are slow.
That's normal. Any function of the glGet form will likely be slow. nVidia and ATI/AMD recommend that you avoid them. The GL driver (and also the GPU) prefer to receive information in the up direction. You can avoid all glGet calls if you track the information yourself.
What I don't understand is why we are fetching these values from GPU memory when they are driven and set from CPU computation? I don't see a reason why when we set these values in GPU memory, we also set them in the StencilManager, rather then setting them in the stencil manager and then having the StencilManager read them back out of the GPU.
I wrote a version of clipping node in cocos2d-x v2 that avoided this performance penalty by doing an approach similar to what I've described above. It won't work in cocos2d-x v3's new rendering pipeline, but I am more then willing to rewrite it for v3 if the engine maintainers see benefit (unless there is something that I am fundamentally misunderstanding).
hello i think i have found the problem of this issue and all the problem slow down the wp8.1/wp10 platform (use stencil)
i notice that this function : glBufferData(GL_ELEMENT_ARRAY_BUFFER, sizeof(_indices[0]) * _filledIndex, _indices, GL_STATIC_DRAW); in CCRenderer.cpp run slow. I have tracked total time run of this function in one loop, it took about 0.05 to 0.1 sec with Scene have two or more node use clipping(stencil) mean that the game fps drop to ~ 10 fps.
Then i change the code into glBufferData(GL_ELEMENT_ARRAY_BUFFER, sizeof(_indices[0]) * _filledIndex, _indices, GL_DYNAMIC_DRAW); the fps grow up to ~ 30fps , maybe the cocos team should check this problem in render system on wp8.1/wp10
thanks
Steps to Reproduce: