Occlusion Culling - Githubissues

Shfty commented 4 years ago

Should Qodot have its own system for this, since Godot doesn't have anything built-in as of 3.1?

My initial reaction is that building a solution to operate in more general Godot terms would be a better time investment, since each brush gets converted into a standard MeshInstance using an ArrayMesh to hold vertex data.

However, under the current model culling would have to be AABB / Box-Sphere based based, since you can't toggle visibility per face.

A better solution, and one that would take advantage of the .map format in a similar way Quake does, would be to instead create one mesh per brush face. That way each face could update its visibility separately based on simple point-in-frustum + normal-dot-camera checks.

It would be better if ArrayMesh offered the ability to selectively show/hide its component surfaces, but that doesn't appear to be a thing. Could store the vertices in an array somewhere and filter them into the ArrayMesh based on visibility, but that seems like it would be a lot of data-thrashing vs the multiple mesh solution.

Wavesonics commented 4 years ago

This might be relevant, some one is making a room/portal add-on: https://github.com/godotengine/godot/issues/22048#issuecomment-546687593

Shfty commented 4 years ago

From therektafire on the TrenchBroom discord:

I was thinking what you could do for rendering optimization is have the mapper divide the rooms into func_groups and give each group a targetname with a delimited list of other func group rooms that should be able to be seen from there, and have the room itself covered by a brush with the "hint" texture which gets converted into an area3d and works similarly to the method described in the video i showed (https://www.youtube.com/watch?v=QXUCGzJUfkc&t=2s)

@Wavesonics Quite possibly, since that sort of tech is very well-optimized for Quake-style geometry.

I'd want to keep Qodot's core stuff as agnostic as possible and avoid tying it to a given optimization framework, so it could still be used optimally for games where portal rendering wouldn't work well (ex. open world with quake maps as free-standing structures). But, I'm open to the idea of integrating existing culling systems (as well as my own, if I take a shot at it) as bolt-on modules.

Shfty commented 4 years ago

Another note from theretkafire - the hint texture used for QBSP vis-ing could be repurposed as an occlusion culling support device.

That said, unless the optimization strategy implemented is BSP, this would break the intended usage of that texture for Qodot maps. Bit of a fine line to walk there, since it's desirable to maintain the original BSP-based semantics, but the functionality might be useful.

Probably best to introduce a new Qodot-specific special textures that are intended for modern acceleration structures instead, if necessary.

HeadClot commented 4 years ago

Seems like @Wavesonics brought it up yesterday. But here is the Github repo for that specific project.

https://github.com/lawnjelly/godot-lportal

HeadClot commented 4 years ago

On the topic of occlusion culling @ShiftyAxel do you plan on making it so that when a texture with a certain naming convention it culls what ever is behind it? Just curious.

Shfty commented 4 years ago

@HeadClot Since culling is an implementation detail that depends on the game, that probably won't be part of the base 'get the geo in and implement brush/face common across all games' functionality for the new *Mapper classes that turn brush and entity data into functional nodes.

But, those systems will make it very easy to layer such a feature on top. I'm leaving prospective implentation details open for now since I want to focus on covering the basic use cases, but it could be done.

Shfty commented 4 years ago

It would be wise to try and integrate LPortal with a Qodot scene, either as part of the example content or in a separate git repo.

Could ignore it via .gitattributes to prevent it from becoming a dependency, shouldn't need to touch Qodot's inner workings like the copy-lib of TextureLayeredMesh does.

Shfty commented 4 years ago

Just noticed that LPortal requires compiling Godot from source. Leaning a custom version of the engine brings way more baggage than I'm comfortable with Qodot having, moving this back to long-term.

lawnjelly commented 4 years ago

Just noticed that LPortal requires compiling Godot from source. Leaning a custom version of the engine brings way more baggage than I'm comfortable with Qodot having, moving this back to long-term.

I'm of the view that doing performance critical stuff like occlusion culling is usually more sensible in c++, despite the added burden of compiling from source. However once e.g. 3.2 is stable I might be able to get some pre-built binaries available for download. GDNative is also a possibility, I haven't experimented with that yet, or know how feasible it is in terms of distribution. You might be able to have some success with a PVS in gdscript or c#. Bastiaan has an example just simply showing / hiding rooms as you move around.

LPortal should work pretty well with quake levels, I've actually done similar systems with PVS with quake 3 type levels in the long distant past. I've previously done the culling by poly approach you mention by having a pre-built static vertex buffer and having a dynamic index buffer to decide which polys to draw. This isn't so easy in Godot, but I'm not sure it is necessary these days, a coarse PVS probably gets you 99% of the benefit (many games are shipped with coarse PVS). The bottlenecks now are often different to those in the 90s / 2000s, I wrote a bit about this here. It should become cheaper to split up meshes even more with vulkan.

You could also have multiple versions of the same mesh, depending on where you are viewing from, as quake type geometry doesn't take much memory. Or even just use immediate geometry to draw the big occluders only for a z pre-pass. You might even find that with low poly levels like quake you can just blast the whole thing at the GPU using the z pre-pass, and it will not be that much slower than a fully culled version, if you are using anything much more than simple shaders.

I have done a little bit of evaluating loading in old game levels for testing / demoing purposes. I hopefully will get round to a proper look at qodot .. I even had a look at duke nukem files, however I'm currently exploring writing something to make procedural levels that will be suitable for a small demo first person shooter. This should make it easy to build navmeshes etc at the same time.

Shfty commented 4 years ago

I'm of the view that doing performance critical stuff like occlusion culling is usually more sensible in c++, despite the added burden of compiling from source. However once e.g. 3.2 is stable I might be able to get some pre-built binaries available for download. GDNative is also a possibility, I haven't experimented with that yet, or know how feasible it is in terms of distribution. You might be able to have some success with a PVS in gdscript or c#. Bastiaan has an example just simply showing / hiding rooms as you move around.

I agree that using C++ is the best way to go for performance, but making changes directly to the engine makes for a non-modular solution and ties you to a specific version of the codebase. It's probably a bit more viable in Godot since the release cycle is fairly slow, but I had an absolute nightmare dealing with UE4 source modifications when I was using it for VR production work to the point where I'm now generally wary of the idea.

Granted, that doesn't mean Qodot couldn't support LPortal since it would be an optional feature and probably sectioned off into another plugin to keep dependencies in check, but I'd at least want to wait until 3.2 is stable and you have a distribution setup sorted out before looking into an official implementation.

I haven't experimented much with GDNative yet either, but going by the key features I'd consider it the best solution for walking the line between performance and ease-of-use for the end user. I'm planning to use it to speed up Qodot's generation process eventually, which should let me keep it as a cross-platform asset library-compatible plugin.

As far as more coarse-grained culling systems go, that looks like some really useful information- cheers. I've thrown around ideas in my head with things like room AABBs and door links, but most all of them have ended up with edge cases that would need to be designed around.

The Z prepass sounds quite promising, since a single draw call atlased mesh of Quake E1M1 can hit ~24FPS on an Intel 405 as-is. Is the idea there that you use discard to pre-emptively terminate a second-pass custom fragment shader on any polygon that's determined to be invisible based on the pre-created Z-buffer?

On loading older levels in, the Quake BSP format includes planar n-sided polys to represent visportals that are generated at compile-time by a program called VIS. I considered the idea of pulling them out to use as culling data, but that introduces a dependency on QBSP and VIS, which adds a bunch of compile time to the process only to throw away half the data. Not ideal for Qodot unless I introduce optional BSP support, but it might be a good place to start if you're looking for importable example content.

lawnjelly commented 4 years ago

On loading older levels in, the Quake BSP format includes planar n-sided polys to represent visportals that are generated at compile-time by a program called VIS. I considered the idea of pulling them out to use as culling data, but that introduces a dependency on QBSP and VIS, which adds a bunch of compile time to the process only to throw away half the data.

If you are editing map files, and you end up using some c++, then as you say it is an option to just use existing MAP -> BSP compiler to give you all the relevant data for runtime (in terms of identifying which BSP area you are within, and the PVS), and just package it up into something Godot friendly. You could also maybe spawn the compiler as an external process. You can't really use the map file 'as is' anyway, from memory it is just a bunch of brushes, which isn't very useful in terms of visibility. If you were going to use portals / rooms you'd need to know where to place them, which is what the BSP compiler does for you. Doing it yourself is not trivial.

I guess I was kind of assuming you would be using BSP converter. If you require the user to manually place portals / rooms, you are missing out on a lot of the benefit of using map files. Part of the reason for the use of brush based map files afaik is it makes it easier to auto-calculate things like bsp planes, cells, portals etc. Brushes are of course also great for collision detection.

Shfty commented 4 years ago

I did some experimenting with software-rasterized occlusion culling in my RasCull repo: https://github.com/Shfty/rascull

My main takeaway is that GDNative is very performant so long as you avoid calling into the Godot API for anything except passing data back and forth, but the API itself is too limited for a use-case like culling. The functions necessary for gathering potentially-visible objects are only available in GDScript, which causes a 4-8ms bottleneck even after some fairly extensive optimization.

That's already enough to compromise a 120FPS performance target without factoring in the ~10-12ms it takes to rasterize and depth test the various scene objects. There's a lot of room for optimizing the rasterizer, but it doesn't seem worthwhile given the GDScript bottleneck.

An engine module is likely the best route to performant culling at this point. Unfortunately based on some further research, the distribution model for it looks pretty rough- as best I can tell there's no way to get around needing to compile the engine if you're working with a custom module, which is going to be a major turn-off for a lot of users.

Godot just got an Epic MegaGrant to work on their graphics tech, so I think it's best I close this with a "wait for 4.0" and focus on the geometry / entity generation that is Qodot's core use case.

lawnjelly commented 4 years ago

I did some experimenting with software-rasterized occlusion culling in my RasCull repo: https://github.com/Shfty/rascull

Congrats on getting anything working (at all) as far as software-rasterized occlusion. You were brave to attempt that in GDNative! :+1:

Afaik even getting a performance boost at all using raster approaches is more tricky than it might seem (you can easily end up in a situation where you are slowing performance). Reduz has plans to try some raster approaches for 4 (I think last time I looked he was keen on trying reprojected depth buffer testing on CPU):

https://github.com/godotengine/godot/issues/22048 https://twitter.com/reduzio/status/1084460955420057600?lang=en

But as Sebastian Aaltonen says on twitter, reprojected depth buffer is not going to be ideal, even if you get it working. Also with anything like reading back depth buffers, or depth testing on the GPU, you are very hardware dependent, and as Godot is multiplatform, that makes it quite difficult (you might get stalls, or it simply not working). That kind of thing is much easier if you are developing for e.g. a specific console.

In that sense (conservative?) software rasterizer is the easiest of the raster approaches to make 'cross platform'. For this approach z buffer min/max and / or tiled techniques seem to be a good option. Intel has done a bit in this area, this may be the link, not sure:

https://software.intel.com/en-us/articles/masked-software-occlusion-culling

On mobile tiled renderers already take this approach, if you store the min / max z for each tile, and you calculate the min z for a triangle, you can quickly reject it if it is past the max z for the tile. You can do the same for objects in a software occlusion approach. The idea is to do the testing on groups of pixels instead of individual pixels, because CPU doesn't have as much horsepower as GPU.

lawnjelly commented 4 years ago

The Z prepass sounds quite promising, since a single draw call atlased mesh of Quake E1M1 can hit ~24FPS on an Intel 405 as-is. Is the idea there that you use discard to pre-emptively terminate a second-pass custom fragment shader on any polygon that's determined to be invisible based on the pre-created Z-buffer?

Sorry I didn't answer this. Z prepass is one of the options in Godot project settings. It draws the opaque objects first using a super cheap shader than just writes the z value to the depth buffer and nothing else. Then it renders as normal, with the z buffer already 'primed'. As the occluded geometry won't pass the z test, it doesn't need to run the fragment shader, so it has much less effect on performance (using up the fill rate).

You don't need to do z prepass on most mobile / tile renderers as they do this already behind the scenes (or something similar). If you do a z prepass on those you are just wasting an extra pass, and will get lower performance.

zzador commented 3 years ago

Hello everyone, Is there a plan for using the "Rooms & Portals" occclusion culling feature of Godot 3.4 for rendering of the quake maps? Would be a very nice feature especially for bigger maps. The original quake engine used a similar feature called "PVS" if I remember it correctly where every BSP-Leaf had a list of other BSP-Leafs that were visible from the former leaf. Since drawcall/triangle-batch size has drastically increased since then it may be smarter to bundle multiple leafs together and then build a PVS for these leaf-bundles on today's machines.

Shfty commented 3 years ago

There's no near-term plan to integrate the new portal culling as a first-class feature.

Quake culling isn't directly comparable or compatible with Godot's new PVS, as Godot expects the user to manually nest nodes in their scene tree to define rooms and portals. Conversely, Quake BSP takes the flat map structure and bakes its PVS as a separate lookup table via an automated process - input data and output data are separable, rather than being tightly coupled to in-editor parent-child relationships, and the user has no involvement in the generation of that data.

Qodot already has a feature for translating flat map structure into a nested tree based on TrenchBroom groups, but I'm not happy with the way it turned out. The two formats are so disparate that it introduces too many Qodot-specific edge-cases and extra rules to follow while building your map. This makes it a poor fit for a core feature, given that one of Qodot's design goals is to - where possible - avoid forcing design considerations or gameplay code on the user that aren't part of the base map spec.

However, that's not to say Qodot is incapable of working with the new Godot PVS - by associating tool scripts with point and brush entity classes, you can post-process them at build time and restructure the flat set of nodes into appropriately-nested Godot-compatible rooms and portals.

I consider this a better approach than integrating it as another core feature, as it allows the user to define a TrenchBroom > Godot Portals conversion spec that works for their project, rather than being beholden to an attempted general-purpose solution that will add complexity for everyone while not necessarily fitting their needs or aligning with the strengths of the map format.

If someone were to put together a robust version of such a system and contribute it as example content that could be reused or modified on a project-by-project basis, then I'd be happy to ship it as part of the example content. However, I'm not likely to do that myself for the foreseeable, as the majority of my Godot time is being taken up by professional work on a project that can't take advantage of the new portal system.

QodotPlugin / qodot-plugin

Occlusion Culling #14