Instancing proposal - Githubissues

solan-solan commented 1 year ago

I remember @DelinWorks mentioned about instancing feature in this thread #688 . Actually it is very useful thing and supported by the most of modern phones. It represented even as extension for opengl es 2.0 if you use ANGLE backend https://github.com/microsoft/angle/wiki/Using-Instancing (but looks like that glsl should be 3.0 anyway).

Applying of this feature consist of the following (for opengl):

use glDrawArraysInstanced instead of glDrawArrays , where count of draw objects will be passed
use glDrawElementsInstanced instead of glDrawElements , where count of draw objects will be passed
use gl_InstanceID in the shader to distinguish verticies from different instances
pass certain instance data (transformation matrix etc.) through the uniform array or float texture and access it with the corresponding gl_InstanceID identifyer
there is also glVertexAttribDivisor which allow to pass datas throug the vertex buffer, but it is optional and could be implemented later, or it is needed to think how to integrate it, I do not know if it is in haste.

Proposal to implementation:

Add CustomCommand::DrawType enum values _ARRAYINSTANCE and _ELEMENTINSTANCE to identify two new draw command;
Add new field _CustomCommand::num_instance with number of copy which should be drawn by this command;
Add new api _void CustomCommand::setNumInstance(int numinst) which set num_instance and choose coresponding draw type base on the following prev draw type ARRAY and num_instance < 2 => ARRAY, else _ARRAYINSTANCE prev draw type ELEMENT and num_instance < 2 => ELEMENT, else _ELEMENTINSTANCE This api should be advanced up to the object relation hierarchy, to MeshRenderer. All this relates to MeshRenderer or some customer classes, which are drawn according to the CustomCommand.

This aproach does not effect performance since it goes in parallel with existing drawing code. There is only one thing. May be it would be better to implement Render::drawCustomCommand to avoid if else statement. Processors use prediction when they perform code when they deal with if statement to prevent its intenral pipeline waiting. Otherwise, if called function will be very far from call place, it could break cache. It need test, but:

Sprite is drawn with TRIANGLES_COMMAND and it is needed to change Render::drawBatchedTriangles() to use instancing feature here, to my understand.

Feel free please to doubt in this approach and give feedback. I have not check it and wanted only to find out what do you have in plan about this feature. Is it planned at all in roadmap? Of course it would be better to implement it for 2d part also, since the opportunity to draw some sprite multiple times without batching sagnifitly increase performance

DelinWorks commented 1 year ago

This feature is actually great to have but needs a LOT of planning, for 3D objects, instancing could be stored on materials and the material needs to be rendered with the amount of objects that use that material accordingly, while also having their custom data stored in a vertex buffer before rendering is done, This is faster than using traditional method of cramming vertices into a single vertex buffer and passing this data to the GPU as it would waste a lot of the CPU time sorting, ordering, and modifying vertices so 3D instancing for complex shapes like spheres, vegetation, and rocks is a HARD pass for 3D instancing.

now 2D is quite complex, The two main methods to render simple quads are batching and instancing.

2D batching is a better option. If it doesn't require any additional setup work (like rebuilding vertex buffers) and doesn't use data redundantly, batching wins always for 2D if the objects are static(like tilemaps) if you use instancing with tilemaps you would be modifying GPU memory every frame which could be avoided using batching. On the other hand, if there are a lot of moving objects in a scene (dynamically transitioning in position scale and color) then of course instancing is the better option here.

imo MeshRenderer could have a subclass called InstancedMeshRenderer that takes an InstancedMaterial instead of a normal Material as to not confuse users.

goes the same for 2D, we could have a SpriteInstanceNode instead of SpriteBatchNode to let the user decide which option they want.

hope that helps!

solan-solan commented 1 year ago

Do I properly understand, that you mean to implement new material subclass with certain instancing shader. Then the objects, which use this material, issue their draw commands inside "draw" function. The engine will collect these commands and change them to another one, to draw all at once?

DelinWorks commented 1 year ago

Having a draw function being called inside every instanced mesh defeats the purpose of having instancing, a custom data buffer that stores transformation matrices for each object needs to be updated when and only WHEN an object changes transformation, gets created, or the dirty transform flag is set. that's how you make full use of the instancing feature. And what better use can any class do beside the material class? so that's why embedding this functionality in a material subclass is good and effective OOP-wise.

and by doing it this way we would rely entirely on the GPU to do this heavy lifting (pixel shading and animations) and the CPU would focus on the game logic.

One problem is how can we integrate a draw function in a InstancedMaterial class? it would become something renderable which I don't think is the good approach here

solan-solan commented 1 year ago

One problem is how can we integrate a draw function in a InstancedMaterial class? it would become something renderable which I don't think is the good approach here

Agree that it would cause confusion to engine architecture.

Having a draw function being called inside every instanced mesh defeats the purpose of having instancing, a custom data buffer that stores transformation matrices for each object needs to be updated when and only WHEN an object changes transformation, gets created, or the dirty transform flag is set. that's how you make full use of the instancing feature.

I posted some possible implementation solution, which allows to do only one draw call. Check it please one more time. What weak points or bad usage do you see in this implementation?

Add CustomCommand::DrawType enum values ARRAY_INSTANCE and ELEMENT_INSTANCE to identify two new draw command;
Add new field _CustomCommand::numinstance with number of copies which should be drawn by this command;
Add new api _void CustomCommand::setNumInstance(int numinst) which set _numinstance and choose coresponding draw type base on the following: if previouse draw type was ARRAY and _numinstance < 2 => ARRAY , else _ARRAYINSTANCE if previouse draw type was ELEMENT and _numinstance < 2 => ELEMENT, else _ELEMENTINSTANCE
Add some api to MeshRenderer or InstancedMeshRenderer to set number of times which this object should be drawn. This api will affect CustomCommand::setNumInstance(int num_inst) of each mesh it include. In that case, you can call InstancedMeshRenderer::draw only once and draw multiple copies on the screen. It is needed to use special material for this object (say InstancedMaterial) which use shader which can process multiple instances (via _glInstanceID glsl variable). All transformation datas about each instance could be passed to this shader as array of uniforms or through the float texture, and being updated if some instance change its position/scale/rotation

DelinWorks commented 1 year ago

I get this implementation it's accurate and great it's just that there are 2 problems:

how can we call draw on a single InstancedMeshRenderer and leave the rest? possible but crude solution: modify the renderer so that it only adds one instance of a mesh, and if their id already exists in the renderer, then they're discarded.
gl_InstanceID is useless when it comes to large amounts of intances (=> 100), which is limited due to shader uniform gpu memory limit, also each time we add or remove an instance, we'd have to recompile the shader to fit the new instance size. solution: we can use instanced array buffers instead of uniforms and pass large amounts of data through these array buffers compared to uniforms.

solan-solan commented 1 year ago

Thanks for checking and reply. Some thoughts about your questions:

I do not supposed that there should be multiple InstancedMeshRenderer objects which will share one mesh. I mean that it would be only one InstancedMeshRenderer which could be rendered multiple times at one draw call. If user creates second InstancedMeshRenderer with the same model, it will be rendered separated with second draw call. It is all about how we are looking at the opengl/metal instancing feature=) To my understand, this feature allows to save performance doing only one glDrawElementsInstanced calling instead of multiple glDrawElements. Otherwise, according to this link https://docs.cocos.com/creator/manual/en/engine/renderable/model-component.html , guys from cocos creator invented Instancing Batching which based on instancing opportunity. Fix me if I wrong, but according to this implementation, you can have some copies of MeshRenderer with the same material and vertex buffer, which would be groupped by the engine to draw once? It is more compicated mechanic, but what if you desire to render something like that: You can see here one boulder model which is rendered 100000 times. In that case, it is not necessary to create separate MeshRenderer object and tell engine to group them. You can simply create one InstancedMeshRenderer, with appropriate number of drawing times and set the certain position/rotation/scale for each instance. May be it is some another point of view to the instance opportunity.

I can see three use cases: A). If you desire to render static object multiple times without animation for instances:

InstancedMeshRenderer incapsulates array of Mat4 to keep Transform matrix + Normal matrix for each instance
InstancedMeshRenderer::setPosition3D(int i, const Vec3& p) will change corresponding matricies and set some update flag to update GL buffer/RGBA32F texture
The same for setScale/setRotationQuat/setRotation3D

B). If you desire to render non skinning object multiple times with animation for instances:

In that case, I think we should take into account that animation is processed on CPU and we could not have thousands objects in any case
There could be created vector of MeshRenderer objects which would not be rendered, but uses for animation processing. Each of this MeshRenderer corresponds to the certain instance. Look please https://github.com/solan-solan/HeightMap/tree/smooth_lod_passing/adxe/Classes SkinBatch class implementation.
To avoid endless GL buffer updating we can have std::unordered_map with MeshRenderer* which playing animation at the time. This map would be checking in the InstancedMeshRenderer::update if it has animated objects and update GL buffer in that case.

C). If you desire to render skinning object multiple times with animation for instances:

There is needed more datas for shader than transformation matricies (like in SkinBatch), but the bone animation is processed on the CPU in the engine which prevents from rendering vast amount of rendering instances. (Since there supposed to be another game logic).
It could be implemented also with the similar approach, but requaire to send matrix pallite datas also.

Generally, instancing feature itself is good for the rendering of static objects with different transformations.

I agree that instanced array buffers is the best place to keep additional datas. I just do not know how much places should be changed to create additional buffer and bind it to shader for opengl/metal backends. To render instanced objects, to my understand it is needed just to add corresponding api to CommandBufferGL and the same for metal. Thats why I proposed to use RGBA32F texture as one of possible storage.

May be it is needed more thinking about this feature and its integration anyway

DelinWorks commented 1 year ago

I've read your reply. Thanks for the in-depth explanation!

I agree with you so far except that when you talked about CPU skinning and how many objects with instancing wouldn't be much of a benefit if animations are skinned by the CPU but there's a thing called GPU skinning, it's a very broad topic and might actually be the solution to rendering instanced skinned meshes.

Also, I see your point that an InstancedMeshRenderer could render many copies of a mesh in itself with only one draw call, just like in sprite batching, you can manually setup your SpriteBatchNode and gain a bit of a performance boost, or you can rely on the engine to do it for you with automatic triangle batching, maybe to get our feet wet we could start with much more simpler things which is making instancing work without animations and then we could branch out to instanced GPU skinning and engine material, vertex data grouping and so much more!

Thats why I proposed to use RGBA32F texture as one of possible storage.

We could use custom layouts in glsl shaders instead of hacking our way around textures. but I see how you're trying to simplify the implementation which I agree with you.

solan-solan commented 1 year ago

just like in sprite batching, you can manually setup your SpriteBatchNode and gain a bit of a performance boost

It would be the best starting point for the feature to implement similar class as SpriteBatchNode with respect to instancing specificity.

or you can rely on the engine to do it for you with automatic triangle batching

I checked how it works for 2d and in my opinian, it could be repeated for instancing also with the following changes:

to collect world/normal matrices for each instance mesh command instead of vertex/index information like it done inside Render::drawBatchedTriangles()
Material id could be calculated with the same maner with restriction that two materials with different color attribute are differs
when flush is processed, then it is needed to update additional attribute buffer and draw one mesh command; something like this, if I did not miss anything But in that case, draw function will be called for each instance, and this is bottleneck, since it would be necessary to do even if transformation was not changed for the MeshRenderer. May be it is more optimized to go from the other end, and to group similar MeshRender according to the material while creation stadia to avoid its processing in runtime for each frame.

GPU skinning

Yes it is interesting thing, but I did not delve into this topic. Anyway it is good thing for axmol)

DelinWorks commented 1 year ago

May be it is more optimized to go from the other end, and to group similar MeshRender according to the material while creation stadia to avoid its processing in runtime for each frame.

to build up on your point, for automatic instancing maybe we could make an InstanceCache class that does this:

Let's suppose that we have a 1000 grass models scattered in a scene. First time these objects will get their draw function called and if the object is of an instance, it's material instance id is stored in InstanceCache in a sorted map and it's transformations are copied along and the object will be removed from the scene graph and it's draw function will never be called again (the object is still in the scene but the scene never acknowledges it's existence and therefore the draw function won't get called) the draw function will only get called again if the object that was removed from the scene graph gets it's transformations changed, the renderer then goes to that said class and sees the instances and their transformations, if it's the first time they get rendered then their transformation data is moved to the GPU and rendered, Otherwise, just render the group.

this is just a rough thought, I don't know if this is a viable solution to automatic instancing but I think it's quite handy. first frame is going to have a HUGE time to render because of instance caching but that can be regarded as a loading screen!

solan-solan commented 1 year ago

to build up on your point, for automatic instancing maybe we could make an InstanceCache class that does this

Yes, looking at it as well. I would just like to notice, that engine auto batching for 3d objects and class like SpriteBatchNode are different things from the customer point of view on api level. Ofcouse these two mechanics will be look to one backend implementation which is instancing on lowlevel. Since that it would be good to implement something simple at first to pave the way -), like SpriteBatchNode implementation, and then integrate more complex things.

And, if I properly understand, it would be good to embed InstanceCache to existing MeshRenderer/MeshMaterial to diminish extra class hierarchy?

solan-solan commented 1 year ago

I have done some tests according to the instance feature. To be more specific, I added instance buffer array as additional vertex buffer for vertex shader, set divisor as 1 through the glVertexAttribDivisor for the certain attributes, and could properly render some object instances with glDrawElementsInstanced. It was done on windows.

Android application could not be linked since glVertexAttribDivisor/glDrawElementsInstanced are not represented inside NDK libGLESv2.so. But it could be linked with NDK libGLESv3.so, and everything works as expected on Xiaomi 11 Lite 5G NE.

I returned to libGLESv2.so and tried to use eglGetProcAddress to get glVertexAttribDivisorEXT/glDrawElementsInstancedEXT addresses, and they are were found in libGLESv2.so, but the following issue occured while executing (by the way, these functions are not represented for export from NDK libGLESv2.so in my case, they are ebsent in _llvm\prebuilt\windows-x8664\sysroot\usr\lib\aarch64-linux-android\31\libGLESv2.so, may be it is wrong path?):

There is no _GL_EXT_draw_instanced/EXT_instancedarrays extensions in my phone, may be it is the reason? But as it was pointed before, glVertexAttribDivisor/glDrawElementsInstanced are supported with gl es 3.0.

It is worth mentioning that no _glInstanceID, nor _glInstanceIDEXT (since there is no properly extension in my case?) do not work on gles 1.0 shaders.

There is info about gl es 2.0 with instancing , which I tried: https://stackoverflow.com/questions/25387959/instanced-drawing-with-opengl-es-2-0-on-ios https://stackoverflow.com/questions/28041936/use-of-undeclared-identifier-gl-instanceid

May be you have in plan to support es 3.0 in near future, since it's actually backward compatible to gl 2.0/gles 1.0 , or I miss something?

DelinWorks commented 1 year ago

@solan-solan you can visit this page https://docs.gl you can see which function is supported in which OpenGL API

glVertexAttribDivisor is only supported in es3 and not es2. Also non of the instanced rendering functions are supported in es2

So support for es3 on android phones in axmol is present? I really don't know about OpenGL in android phones but since the library libGLESv2.so is loaded, it might be that axmol actually doesn't support es3 and it might actually be fixed with just a lib swap or a header define to use libGLESv3.so

@halx99 do you know anything about this?. do android projects only use es2?. and how hard is it to implement es3?.

solan-solan commented 1 year ago

@DelinWorks Yes, you are right that this feature is full supported only via es3, but just as I hoped that it would be accessible with es2 extension. But looks like that these extensions are not very common.

So support for es3 on android phones in axmol is present?

To get fully working example for Android with glVertexAttribDivisor/glDrawElementsInstanced I have done only one change in this config file, and build the project via Android Studio:

PS And, it seems, that it was needed to declare above functions since they does not meet in es2 include files.

halx99 commented 1 year ago

refer to: https://github.com/google/angle the .so name always libGLESv2 but GLES3.0 implemented on android

solan-solan commented 1 year ago

@halx99

refer to: https://github.com/google/angle the .so name always libGLESv2 but GLES3.0 implemented on android

Could you please clarify a little about this? Now there is angle build for windows in the cocos2d/thirdparty/angle. You have in plan to add angle build for android with name libGLESv2.so which would implement GLESv3 and the same for Vulkan, do i properly understand or not? And, what do you think about performance? I found the article https://vulkan.org/user/pages/09.events/vulkanised-2023/vulkanised_2023_angle_as_a_system_graphics_driver.pdf Looks like it could affect fps It seems that performance reflection will depend from phone cpu. This aproach is best for Vulkan, but would not it be overhead for GLES3?

rh101 commented 1 year ago

OpenGL ES support distribution across Android devices: https://developer.android.com/about/dashboards/index.html#OpenGL

solan-solan commented 1 year ago

es v2 is gradually becoming a thing of the past)

DelinWorks commented 1 year ago

@solan-solan I think I just noticed that ESv3 is indeed being loaded in my android device! (just like @halx99 mentioned) it's just the lib's name is libGLESv2.so, some devs forgot to change the name of the object file or something I guess, also you seem to have loaded the glDrawElementsInstancedEXT have you tried glDrawElementsInstanced directly without getting the proc address? or you can instead get the proc address for glDrawElementsInstancedARB because allegedly it has wider support on older/newer devices. I will try to tinker with it and see how but I got no memory (only 16gb, android studio is fat) so If my extra new memory modules arrive I'll update you on that!

in other words, you should prefer using glDrawElementsInstancedARB or the core glDrawElementsInstanced function over glDrawElementsInstancedEXT. When using glDrawElementsInstancedARB or glDrawElementsInstanced, you do not need to enable any extensions explicitly, as they are part of the core OpenGL specification. However, if you are targeting an older system that does not support glDrawElementsInstancedARB or glDrawElementsInstanced, then glDrawElementsInstancedEXT may be your only option.

PS, The libGLESv2.so doesn't refer to the OpenGL ES version, it is named after the version of the EGL specification it implements, rather than the version of the OpenGL ES specification that it supports.

solan-solan commented 1 year ago

@DelinWorks

I will try to tinker with it and see how but I got no memory (only 16gb, android studio is fat) so If my extra new memory modules arrive I'll update you on that!

I really know what you are talking about) It would be good if you tell me what you're digging up in case of success.

The thing is that I have an error on the linking stage when I use libGLESv2.so for glDrawElementsInstancedEXT/glDrawElementsInstancedARB/glDrawElementsInstanced. And names of these functions are absent in the libGLESv2.so in the path Android\Sdk\ndk\23.2.8568313\toolchains\llvm\prebuilt\windows-x86_64\sysroot\usr\lib\x86_64-linux-android\31\libGLESv2.so It means that they are not exported by names from NDK libGLESv2.so. libGLESv2.so

Otherwise glDrawElementsInstanced exists in the libGLESv3.so

DelinWorks commented 1 year ago

that is really weird.. I think I confused it with libEGLv2.lib and it may indeed need libGLESv3.so to work! Have you tried making the app load libGLESv3.so instead of v2? if its as simple as that 😅

DelinWorks commented 1 year ago

I guess 16gb of memory REALLY isn't enough.. I'll try and get it to work

DelinWorks commented 1 year ago

I tested and built successfully and It actually loads OpenGL ES 2.0, but Samsung debugging told me ESv3 that's odd..

D/axmol debug info: {
        supports_discard_framebuffer: false
        supports_ATITC: false
        supports_OES_map_buffer: false
        max_vertex_attributes: 16
        supports_PVRTC: false
        supports_OES_packed_depth_stencil: true
        axmol.compiled_with_profiler: false
        supports_NPOT: true
        supports_ETC1: true
        renderer: Android Emulator OpenGL ES Translator (NVIDIA GeForce RTX 3070/PCIe/SSE2)
        supports_OES_depth24: true
        axmol.build_type: DEBUG
        max_samples_allowed: 0
        supports_ETC2: true
        vendor: Google (NVIDIA Corporation)
        axmol.version: axmol-1.0.0
        axmol.compiled_with_gl_state_cache: true
        supports_ASTC: true
        version: OpenGL ES 2.0 (4.5.0 NVIDIA 531.41)
        max_texture_units: 192
        max_texture_size: 32768
        supports_vertex_array_object: true
        supports_S3TC: false
        supports_BGRA8888: false
    }

solan-solan commented 1 year ago

Have you tried making the app load libGLESv3.so instead of v2?

I tried and built successfully, but not tested cpp-test, only my example HeightMap. The feature worked. The main thing, I think, that bundle GLesV3 library/ GLES 1.0 shaders works fine.

I tested and built successfully and It actually loads OpenGL ES 2.0, but Samsung debugging told me ESv3 that's odd..

Does it seem the opposite or not? =))

I checked on real device Xiaomi 11 Lite 5G NE and everything was ok. Do you try emulator on windows? May be it is reason?

DelinWorks commented 1 year ago

@solan-solan If you go to android java source and then AxmolActivity.java and then on line 270 you can add this to force OpenGL ES v3.0 this.mGLSurfaceView.setEGLContextClientVersion(3); but then the library complains that it doesn't support ESv3 on runtime, I think we can just replace libGLESv2.so with libGLESv3.so and reflect these changes on cmake.

I checked on real device Xiaomi 11 Lite 5G NE and everything wes ok. Do you try emulator on windows? May be it is reason?

Ah! it might be the platform choosing v3 but the emulator has support only upto v2. I'll see if the option this.mGLSurfaceView.setEGLContextClientVersion(3); works on a real device not an emulator

solan-solan commented 1 year ago

@DelinWorks this.mGLSurfaceView.setEGLContextClientVersion(3) Ok, but according to the name, it is related to the eglGetProcAddress. In any case, somebody should to load v3 to the address space) And if so, we could not use v3 functions directly, only with eglGetProcAddress ?

DelinWorks commented 1 year ago

So is the solution to simply link libGLESv3.so directly? and glad.h would simply have these functions as core rather than calling eglGetProcAddress?

solan-solan commented 1 year ago

I think, yes, it is only needed the library itself and all v3 functions in headers. It could be as one additional backend which could be chosen while project generating, of course if no issues in cpp/lua-tests. And if it does not contradict to roadmap and main direction of development

Sorry if I'm looking too far =))

If you are meaning the certain feature, then yes - library and two functions in header

DelinWorks commented 1 year ago

I think it should be a define in the code or a cmake option like ANDROID_USE_GLES3 to choose which API for android, that way it won't interfere with other plans in the roadmap. and it will be easier to switch back if the device doesn't support v3 and features like instancing will be disabled internally.

solan-solan commented 1 year ago

Yes, I agree with you.

and it will be easier to switch back if the device doesn't support v3 and features like instancing will be disabled internally

I would prefer to have two different instances of one applications in the market, one with v3 and second with v2 requiments, without switching backend. Using v3 will reflect on code logic, it would not be simple to restrict game logic depend on backend which device supports

DelinWorks commented 1 year ago

what if the device doesn't support v3 how can you render with instancing? you can either not render the instances which is a naive solution or just render them using multiple glDraw calls. I might not have understood what you said.

solan-solan commented 1 year ago

I mean that application which use instancing should not be intended for devices which do not support v3. There are much more devices with v3 support now, especially if it's owner would like to play game. Otherwise you can use v2 and deminish rendered objects, but build it as separate application

DelinWorks commented 1 year ago

yes! that's what I was trying to say, You can choose which API at build-time not at run-time. because If someone wants to support v2 only they can do it with a simple define.

axmolengine / axmol

Instancing proposal #1043