godotengine / godot-docs

Godot Engine official documentation
https://docs.godotengine.org
Other
3.91k stars 3.2k forks source link

Document how to interpret mesh vertex count in the editor #4400

Open dioptryk opened 3 years ago

dioptryk commented 3 years ago

Edit by @akien-mga:

This was originally filed as a bug report but it's a documentation issue. See @clayjohn's writeup at https://github.com/godotengine/godot-docs/issues/4400#issuecomment-731634223 which needs to be integrated somehow in the docs.


Godot version: 3.2.3.stable.official

OS/device including version: Windows 10 Pro Build 19041 GLES3, GTX970

Issue description: Hello fellow Godotians,

I found a nice tree model for a forest, which I'm implementing using MultiMeshInstance. The model (an OBJ) has 50k vertices and 100k faces, as shown by Blender, MeshLab and notepad (I counted vertex def rows). The model imports without a problem and looks exactly as it should, but the performance is abyssmal. 100 of these trees (and nothing else besides that) gives me 30 FPS.

forest

My jaw dropped when I saw Performance.RENDER_VERTICES_IN_FRAME values, which is around 200 MILLION. In addition to that, when enabling viewport info in the editor, it shows that the model is around 600k vertices, which is more than 10x than the model on disk! I've tried to export from blender using different formats but nothing changes. The model itself is shown as having more than ten times the amount of vertices, while the amount of rendered vertices if even more, because for 200M vertices each model would have to be made of 2M vertices. I've no idea what's happening here and it's driving me crazy. Please advise.

Steps to reproduce: Just import the attached OBJ and check polycount. old_pine_trunk_clean.zip

Minimal reproduction project: Just the model.

Zireael07 commented 3 years ago

In editor, viewport info cheats by also counting the vertices of the grid if you have it on. I am not 100% sure but it might also be counting vertices of any collision outlines and the like. But it definitely counts the grid.

As for what is happening in-game, I have no idea.

dioptryk commented 3 years ago

@Zireael07 that was a good guess, but the grid is only few hundred verts big, which can be checked by hiding it. I've also tried to switch import options (not many of them for OBJ, just tangents, scaling and optimization), but after the re-import, the best I got was 300k (50% reduction), which is still 6x too many. ...

Calinou commented 3 years ago

Can you upload the whole project? Keep in mind using many lights will decrease performance, especially when using the GLES2 renderer as a multi-pass approach is used there.

Also, if your model uses sharp faces (instead of smooth faces), every vertex will be duplicated. Consider marking your object as smooth in blender and enabling Auto Smooth in the mesh settings so that faces are automatically smoothed based on angle.

Either way, I'd argue 50K vertices for a tree model is quite excessive, even for its highest detail variant. You definitely should not be using such a detailed model for trees seen from a distance! I'd target 10K vertices at most for the tree model when viewed from up close. For trees seen from a distance, use a less detailed mesh with 1K-3K vertices. You can use the godot-lod add-on to achieve this.

Zireael07 commented 3 years ago

@dioptryk: I wasn't saying the grid was the only contributor, but that I noticed that it was counted because most of my models are very low-poly, at most 1000 verts and the editor was saying roughly double that :P

dioptryk commented 3 years ago

@Calinou I'll try upload this later. Still, it can be recreated by simply making a multimesh with this mesh and populating a plane. I don't care for GLES2 :-) and the tree was already significantly simplified using MeshLab, but I see no reason not to decrease vertex count further :)

There's only a single directional light. Material has only a diffuse texture channel and is NOT transparent.

I may try having the visible trees calculated per frame, so that not everything is rendered, but it may be a bit complicated, since the idea is for the entire level to be dynamically generated. Still, I wasn't expecting the performance drop I got.

Some comparison images from tooling below.

tree1 tree2 tree3

Calinou commented 3 years ago

@dioptryk That topology is quite excessive. You really don't need your models to be this detailed; it just has no benefit at this point, not even when viewed from up close on a 4K display.

dioptryk commented 3 years ago

@Calinou I agree, but still this is only a prototype (I just found out about photogrammetry, which is by default kinda highpoly ;) and wanted to try the models I found on the net in Godot ), and I'm really concerned about what's happening. I plan heavily using the multimesh, but if all the meshes will "explode" like that, it will be a significant problem.

dioptryk commented 3 years ago

I've tested another model, a treasure chest (https://sketchfab.com/3d-models/medieval-chest-037b03a3e0274279be4b93b7c7cedf01), and the result is similar (this time for GLTF): Blender shows 6k vertices, while Godot shows 46k!

If this is confirmed, then wouldn't it mean that everyone using Godot has reduced performance because of "inflated" meshes?

Zireael07 commented 3 years ago

I found an old issue that looks related: https://github.com/godotengine/godot/issues/25957

dioptryk commented 3 years ago

@Zireael07 thank you for finding this, this could be it. I've attached RenderDoc single frame dump with the chest from my previous post.

chest chest2

If I'm reading this correctly (could someone confirm, this is my third time using RenderDoc, I think ;) ), glDrawElements rendering 21k elements from array means it's rendering 7k triangles, so it seems this is the same as Blender shows?

lawnjelly commented 3 years ago

Just some points if you were not aware:

Num vertices on import

Number of vertices in a modelling program is not the same as the number of vertices required to render in a game engine. In a modelling program vertex positions are often stored separately from uvs, and normals.

In a typical game engine they are used in interleaved format so require unique vertices. This means that one vertex position using 3 uvs will require at least 3 unique verts. Same with normals etc. Often double sided faces are rendered with double the geometry, with normals pointing in opposite direction. Faces are also usually usually split into triangles.

Render passes

Although you may only see one object on screen it may be rendered in several passes. Each shadow map requires a separate render, so each light casting shadows, and some lights may require multiple passes, e.g. directional light with splits. There is also z only pre-pass, I'm not sure if this is counted in the total.

slapin commented 3 years ago

I totally confirm the observation, also editor viewport contents do not add much as I myself populate MultiMeshInstance at run time with 256 trees 200 verts each and get 50M vertex count. Which is kind of confusing. I dropped MultiMeshInstance use and replaced it with GridMap and have sane 1M polys with 1000 trees, so I can't not suggest that and drop use of MultiMesh.

slapin commented 3 years ago

@lawnjelly that is usually very low increase, if any. Can't be the case here.

lawnjelly commented 3 years ago

Don't get me wrong, there may be something else going on too that shouldn't be happening.

In these kinds of cases this proposal would help, at least to be easily able to see the vert counts in a mesh instance: (I don't really know enough about the UI side to implement this) https://github.com/godotengine/godot-proposals/issues/248

slapin commented 3 years ago

It is easy to get actial vertex count on mesh resource using APIs to compare. I guess there is something really wrong with MultiMesh as this problem is never happens on standalone meshes or GridMap.

lawnjelly commented 3 years ago

Edit: This turns out to be incorrect because the verts in the monitor is the number of indices, not the number of source verts but left for posterity...

It seems like the raw mesh is using 300K verts, so it looks like each vertex in each separate face is coming out unique. The obj also contains tex coords so this could conceivably be legit, however I tried reexporting it from blender without uvs and the mesh still appears to use 300K verts in godot.

So it does suggest something might be going wrong. So I decimated the mesh to 2214 verts, and exported without uvs or normals, 6642 verts are drawn, even with z prepass off and no lights.

Without being able to directly get the vertex count without writing some gdscript, I can't discount something in the renderer but it is possible the OBJ importer is importing it incorrectly.

I haven't really got blender setup, but something else to try would be importing it in a different format. It could well be a problem just with the obj importer.

Lexpartizan commented 3 years ago

I get in viewport with a usual meshinstance UV 160,000 vertices for a model with 13,000 vertices, 26,000 triangles, and one UV.

If I get surface_get_array(0), then I get about 14500 vertices. I consider this a normal value for 13500 verts in blender, since the vertices on the UV seams are duplicated. But not 160,000. @lawnjelly, Given that counting vertexes by getting arrays from an already imported model, I think that everything is fine with the import.

This model from project makehuman. I ignored this discrepancy, thinking that I was simply not able to understand how vertexes are counted or that it was an error in the output of information. However, I would like to clarify this situation. This would help me optimize my models or alert the community to possible engine errors that might affect performance.

dioptryk commented 3 years ago

In a typical game engine they are used in interleaved format so require unique vertices. This means that one vertex position using 3 uvs will require at least 3 unique verts. @lawnjelly

Are you absolutely sure? Because from what I remember from my OpenGL days, you just need to define a custom vertex format if you have multiple UVs, tangents or whatever... you do NOT replicate vertices. For example, see https://learnopengl.com/Getting-started/Textures. If anything, it could be other way around, 3d modeling programs would use multiple vertices because it would be easier to modify them. This also makes them use more memory. Please correct me if I'm mistaken here.

My RenderDoc snapshot would suggest that Godot does things right in regards to rendering itself, it's just the statistics is presented incorrectly. This, however, also means that using even a single highpoly model with instancing is a big no-no, which is a bit underwhelming. I certainly didn't expect that mere 50k vertices would be a problem for instancing.

lawnjelly commented 3 years ago

You can do both, but generally GPU guys tend to recommend to use interleaved because it is more cache friendly and easier for the hardware. e.g. https://developer.apple.com/library/archive/documentation/3DDrawing/Conceptual/OpenGLES_ProgrammingGuide/TechniquesforWorkingwithVertexData/TechniquesforWorkingwithVertexData.html

This is likely to be especially important for larger models. I'm not that familiar with Godot's 3d rasterizer but I got the impression it just used interleaved (that's how you fill verts when creating custom geometry). I think there's a section in the docs on this but I forget where.

Perhaps @clayjohn can chime in, I'm only really familiar with the 2d side, I haven't looked at the details of the 3d rasterizer yet.

My RenderDoc snapshot would suggest that Godot does things right in regards to rendering itself, it's just the statistics is presented incorrectly.

Yes that's another possibility.

Lexpartizan commented 3 years ago

I, in turn, assume that the models are imported correctly, since having received an array of vertices from the mesh, its size fully meets the expected (14000), and not the output (160,000) values. You are sure that renderdoc shows that vertexes and polygons are not drawn multiple times? 21672 draw elements - maybe this triangles, not verticles? Unfortunately, I am not familiar with Rederdoc. I also think that the problem is more with statistics, but we all would like to find the reasons for low performance, so we are a little tense now.

mrjustaguy commented 3 years ago

Depth Pass under material properties cuts down the vert count in half if set to never (100k vert sphere, from 600k vert to 300k vert) or you can make it transparent.. or shadow to opacity flag, possibly some others.. on that note also.. one other thing changes with the vert count drop.. everything aside from objects drawn in fact drops by 1 along with the half verts...

Lexpartizan commented 3 years ago

Wait a minute, let's go back to renderdoc and the chest. I went to the chronos website. And looked at the glDrawElements. https://www.khronos.org/registry/OpenGL-Refpages/es2.0/xhtml/glDrawElements.xml

void glDrawElements(GLenum mode, GLsizei count, GLenum type, const void * indices); mode - Specifies what kind of primitives to render. Symbolic constants GL_POINTS, GL_LINE_STRIP, GL_LINE_LOOP, GL_LINES, GL_TRIANGLE_STRIP, GL_TRIANGLE_FAN, and GL_TRIANGLES are accepted. in our case GL_TRIANGLES

count - Specifies the number of elements to be rendered. In our case, these elements are GL_TRIANGLES So it turns out that for a model with 7622 triangles drawn 21672 triangles.

Is it possible to load this model in blender, unity or unreal and take Renderdoc screenshot from there to compare the number of triangles?

lawnjelly commented 3 years ago

@clayjohn is going to write later but he's confirmed that the stats can be confusing because they are showing the number of indices drawn, not the number of source verts in the model. This makes sense, but using the name 'vertices' is a little vague and can lead to confusion as we see here.

As I say in the proposal, to make things more obvious it would be nice to be able to see the stats, the number of source verts, number of tris, number of indices etc when the mesh instance is selected in the IDE.

dioptryk commented 3 years ago

@Lexpartizan Khronos really should fix their docs... count is NOT the number of triangles, but indices in the referenced array, this is also hinted by index parameter. See https://community.khronos.org/t/gldrawelements-count-question/31276. So, we divide by 3 and everything seems fine.

Lexpartizan commented 3 years ago

@dioptryk Thank you for the explanation.

dioptryk commented 3 years ago

@Lexpartizan I just tried to attach RenderDoc to Blender out of curiosity, but it immediately crashes after trying to open model import menu. Perhaps some voodoo is required for this to work :)

dioptryk commented 3 years ago

@lawnjelly Thank you, that explains it. I couldn't find this in the code. So, the only question which remains, from my perspective: is the big performance drop expected? Is 50k vertices really that much for instancing? Maybe someone did tests with various models? I may try that during the weekend.

Perhaps there's some threshold here from which the performance suddenly goes down? I simplified my tree down to <1k vertices and everything runs smooth as butter, but this was just an example model.

If you guys consider that unanswerable, please just close this thread.

Zireael07 commented 3 years ago

@dioptryk: Godot 3's performance in 3D s not amazing by any definition. You need to be very careful with things like lights/shadows (especially the latter) but I have around 1 mil verts (what is returned by the vertices count function in Performance class) on screen at most times, no MultiMesh involved, and I can hit 30 fps. At night, when shadows are disabled, this jumps up to 40ish-50. Interestingly, switching to a camera that looks down from the top does not improve the fps visibly.

dioptryk commented 3 years ago

@Zireael07 Yes, shadows seem to matter the most, especially because a single multimesh renders everything all the time. When I returned to the original 50k model, tripled the number of trees, but disabled shadows, I got the same FPS as with original number of trees and shadows enabled. So, by simple calculation, disabling shadows tripled my FPS in this case. I guess if I have too many instanced models, even if they're very simple, then the shadows will be an FPS killer. So, it seems when implementing an instanced forest with shadows, dynamic culling/visibility is a must.

I just got an idea.. render complex objects, but cast simple shadows from simple invisible meshes, positioned at the same places :D I'll try it when I encounter performance problems again.

lawnjelly commented 3 years ago

I just got an idea.. render complex objects, but cast simple shadows from simple invisible meshes, positioned at the same places :D I'll try it when I encounter performance problems again.

See: https://www.reddit.com/r/godot/comments/i2j45k/quick_tip_shadow_impostors_and_vertex_count/

(there's never any new ideas, someone always thought of them in a research lab in the 70s, but they didn't become popular at the time... :grin: )

slapin commented 3 years ago

But why using GridMap instead of MultiMesh gets fps back to 60?

Lexpartizan commented 3 years ago

Yes, what I get in renderdoc is exactly the same as mesh.surface_get_arrays(0)[Mesh.ARRAY_INDEX] Unfortunately, I was not able to connect using RenderDoc to blender or other programs where I could check. But it is obvious that the total number of these render elements exactly corresponds to the size of the index array. In any case, it shows a much larger number in the viewport (160 000) than the size of the index array (80 000).

clayjohn commented 3 years ago

@slapin multimesh draws all instances in a single draw call so none of the instances get culled. Gridmap draws the instances separately so it takes advantage of culling.

@Lexpartizan as lawnjelly explained above, every vertex is rendered twice, once in the depth-prepass and once during rendering.

slapin commented 3 years ago

@clayjohn but why the same number of instances produce so large difference in polycount multimesh vs gridmap?

slapin commented 3 years ago

@clayjohn but doesn't prepass go in separate drawcall?

smix8 commented 3 years ago

So, the only question which remains, from my perspective: is the big performance drop expected? Is 50k vertices really that much for instancing?

Yes and yes if you use so many of them in the scene as background props.

If you look at the topology screenshot that is the mesh detail level that is used for bakeing a normalmap for a few minutes or for a raycast 3D rendering suite for rendering a few hours. Definitely not for a gameplay 3D model that should be used as a background prop in a realtime game engine that wants to run at 30-60+ frames a second.

50k for a single tree trunk without leafs is madness amount. Just for comparison, main characters from many, modern games have around 100-250k at LOD0 but half of that is the hair prop and the equipment and they take 1/3 or more of the screen when closeup ... so decimate or stomp that tree asset for good.

clayjohn commented 3 years ago

@slapin the poly count comes from the number of trees that are drawn in the frame. If a tree is culled, it doesn't get drawn so it doesn't contribute. But multimeshs can't cull individual instances, either the whole thing is culled or none are culled.

Yes. The depth pre-pass is a separate draw call.

slapin commented 3 years ago

Well one could live with assets with these much detail if Godot had LOD support and occlusion culling, but still having so much polycount balooning in engine is insane, so you get millions of polygons from thin air... So you have to plan your polycount so to know that it will be at least doubled or tripled by engine. Or even more so.

@clayjohn but in my example the polycount by multimesh is bigger than when all objects drawn separately and all visible, does it mean even backface and frustum culling does not apply to them? I basically create 16 by 16 grid of the same 200 vertices plant, with separate objects I get about ~48000 vertices, with gridmap I get the same ~48000 vertices, but with multimesh I get 4M-16M vertices (fps drops to 40-48 fps on i7 2600k) and if I make multimeshes smaller (use 8x8 or 4x4 multimeshes) I don't see any gain from that. Any ideas? Also why for large number of objects (like 1000) gridmap stays on 60 fps but separate objects drop to 48 fps? what could lead to that? Why gridmap provides any performance gain?

dioptryk commented 3 years ago

Some tests: for fun I checked the same forest scene with 330 trees (50k vertice version) distributed randomly and shadows enabled (single directional light, PSSM4). I compared multimesh to a mesh instance implementation (so 330 mesh instances vs single multimesh).

Multimesh: 11 FPS Mesh: 50 FPS (mostly, since culling is in effect, depends where you look)

And for a <1k vertices tree: Multimesh: 220 FPS Mesh: 250 FPS

So, depending on your scene, it may be actually easier and faster to use a Mesh node per tree :-) You get culling, can easily implement LOD and it's easier to randomize the trees, which is what I want to do (this would require multiple Multimeshes, like GridMap does internally).

This is a quite specific scene, mind you, since it has fog and visibility is low. Still, whatever works, I guess.

Calinou commented 3 years ago

@dioptryk 330 individual meshes isn't that much. The resulting number of draw calls is quite affordable on desktop systems.

dioptryk commented 3 years ago

Scene after some more experiments, 330 trees (trunks have <1k vertices, the gnarly ones have 4k), all using Mesh rather than MultiMesh. Foliage is MultiMesh, though. Every tree has slightly randomized scale and translation (up to one tree on 1x1 grid). 140 FPS in fullscreen on 2560x1440 with MSAA. Camera range is 10.

forest1

slapin commented 3 years ago

I have same problem with foliage but the number of meshes end up rather large, so I have no way to get sane fps. Gridmap for some reason gives reasonable 60 fps for 256 polygon grass chunks. I was not able to get 60 fps with multimesh.

Zireael07 commented 3 years ago

Gridmap does culling, multimesh doesn't @slapin

slapin commented 3 years ago

@Zireael07 I know, but why Gridmap ends up better than standalone meshes?

clayjohn commented 3 years ago

I will attempt to write a comprehensive answer here. There are 2 questions being discussed in this thread and I will answer them separately.

Why is RENDER_VERTICES_IN_FRAME so high?

There are two pieces to this 1) the number of vertices in the model is not the same as the number that needs to be rendered and 2) the game engine needs to render the model multiple times to draw a single frame.

1) RENDER_VERTICES_IN_FRAME represents how many vertices are drawn in the frame. The vertex count reported by modelling programs like Blender represents the number of vertices in the model itself. For performance reasons (which @lawnjelly describes above) game engines render vertices in an interleaved format, meaning that each vertex needs to be specified for each face it is drawn in. Accordingly, the number of vertices is actually the number of faces * 3, which in game engine terms, is the number of indices in the index array. In OPs situation, this means each tree is 300k game engine vertices, not 50k vertices like in Blender. TLDR; Blender reports the number of unique vertices, but at render time, you need to render each vertex for each time it is used.

total vertices so far 300k per mesh * 100 meshes = 30,000,000

2) Godot uses a forward renderer. Typically the most expensive part about rendering is the cost of shading (i.e. calculating lighting) each pixel. To avoid calculating shading on pixels that are eventually covered up by other models, Godot renders every object in a depth-prepass. The depth prepass renders all objects depth with no color. Then when actually drawing models, the GPU can do a depth test and avoid shading fragments that will eventually be covered up. Normally this makes rendering way faster, but when a scene is bottlenecked by the number of vertices, and the scene uses simple lighting, the depth prepass may actually bottleneck rendering. If you are rendering a scene with an extremely high number of vertices, but with relatively simple lighting, you can turn off the prepass in ProjectSettings using depth-prepass-enable. TLDR; the depth prepass draws every object an additional time. So in a single frame, every object is drawn twice

total vertices so far 300k per mesh * 100 meshes * 2 = 60,000,000

Finally, each light with shadows needs to render at least one more time. In the above scene it looks like a directional light is used. DirectionalLight shadows by default use PSSM with 4 splits. Each split requires rendering objects once. Typically objects are culled and dont need to be in more than 1 or 2 splits, but large objects (or MultiMeshes that span a large area) may be included in all splits requiring them to be drawn up to four additional times.

total vertices so far 300k per mesh * 100 meshes * (1 depth prepass + 1 shaded + 4 shadowmaps) = 180,000,000

In the end we have a total of 180 million vertices expected with the setup in OPs original post. Below I will discuss how this impacts performance and why we get different numbers for MultiMesh, MeshInstance, and GridMap. The short answer is culling.

Why is RENDER_VERTICES_IN_FRAME different for MultiMesh, MeshInstance, and GridMap?

RENDER_VERTICES_IN_FRAME only counts vertices that get sent to the GPU for drawing. It doesn't include vertices from objects that are, for example, behind the camera.

During rendering Godot is very careful to avoid rendering objects that won't be visible. If it checks an object and sees that it won't be visible, it culls the item from the draw list. We call this process culling.

The main type of culling that Godot uses is called frustum culling. Frustum culling checks the AABB of the object against the (viewing frustum](https://en.wikipedia.org/wiki/Viewing_frustum) of the camera. If the AABB is completely outside of the viewing frustum then the object is culled.

MeshInstances Meshinstances draw each object in a draw call. They are also processed in the node tree by themselves. They are the default object for every object in your game. As you will read below, there are times that it will be faster to use other tools.

The benefit of MeshInstances is that the renderer can cull each object individually. The downside is that you may face a drawcall bottleneck if you have too many.

A good rule of thumb is to use MeshInstances by default and then replace them with MultiMeshes once you have tens of thousands and are facing a draw call bottleneck.

MultiMeshes MultiMeshes are are very fast way to draw thousands of the same object. The reason they are fast is that they draw all instances of the object at the same time. The downside is that all instances have to be treated as one large object. So, for example, if you have a forest of trees that uses 1 MultiMesh, either the entire forest is drawn or none of it is drawn. There is no ability for the renderer to cull specific instances. This makes the MultiMesh a good choice when you have thousands of objects that are close together and are guaranteed to be visible together. But it makes the MultiMesh a bad choice for when only a few of your objects are visible at a time.

A good rule of thumb is to only use a MultiMesh when you have tens of thousands of an object that will always be visible at once.

GridMaps The GridMap is a bit more complex beast. It isn't a single object. It is a helpful utility that allows you to place objects in a grid. The GridMap is limited to objects you have placed in a MeshLibrary. When you add objects to a GridMap, they are divided into "octants". Octants are essentially groups of cells in the grid. Within an octant, all objects that use the same mesh are grouped into a MultiMesh.

At render time, the GridMap draws the octants. The renderer is able to cull each octant. Accordingly, the GridMap allows culling on a much finer lever, while still maintainign the benefits of the MultiMesh.

Conclusion All three nodes have different pros and cons. MultiMeshes are best when you are draw call bottlenecked, while MeshInstances are best when you have a vertex bottleneck. GridMaps strike a balance between the two, while also exposing a unique way of authoring scenes. No one is "better" than the others, and choosing between them will require heavy profiling of your scenes.

In general, more productive optimizations will include:

dioptryk commented 3 years ago

Thank you for an excellent and comprehensive answer, @clayjohn. This is very enlightening and allows many of us to better understand what happens behind the scenes. I think this issue can now be closed.

Zireael07 commented 3 years ago

@Calinou: Could @clayjohn's answer be put somewhere visible in documentation before this issue is closed?

slapin commented 3 years ago

I think closing this issue until documentation is updated is very bad idea.

akien-mga commented 3 years ago

Moved to godot-docs.