ellioman / Indirect-Rendering-With-Compute-Shaders

An example of drawing numerous instances using Unity3D, compute shaders and Graphics.DrawMeshInstancedIndirect with Frustum & Occlusion culling and LOD'ing.
678 stars 85 forks source link

All LODs Render Before LOD Ranges #2

Closed mrochon1 closed 5 years ago

mrochon1 commented 6 years ago

LOD1 And LOD2 are always present, and are just 'hiding' inside the next highest up LOD. Noticed the issue when attempting to swap meshes out from the sphere.

mrochon1 commented 6 years ago

Noticed using a single mesh LOD set that offsetting the unity Instance ID, by the instance counts of each LOD in the material shader corrected this.

The idea is, since all objects use the same position list, and the positions list is sorted;

LOD0 must use positions indexes 0 to LOD0 Instance Count LOD1 must use positions indexes LOD0 Instance Count to LOD1 Instance Count LOD2 must use positions indexes LOD1 Instance Count to LOD2 Instance Count

By passing the offsets via the property blocks or materials used during the drawmeshinstancedindirect call, this corrected everything, I had mentioned in my previous comment.

But would need major revisions to be applicable using multiple Objects / Mesh Sets due to how the position list is sorted.

I would need to group them so positions are not only organized by distance, but by object as well. (Object 1 positions come first in the list, then Object 2's after all of object ones) Not sure enough of everything that is happening to make those changes correctly.

Though I am kind of curious was this the intended result? Once the offsets are introduced all object instances become visible and are represented by the proper LOD.

By that I mean, LOD0 comes first with no overlap/intersection between it and LOD1 and LOD2 is used for every mesh in the stack, not accounted for by LOD0 and LOD1.

To be more specific, LOD2 doesn't disappear.

Pardon My Grammerz. Super Sleepy..

Really Great Stuff though. :) Definitely open to giving you hand here and there for offering up a nice example of this technique. I'm pretty positive i can correct the bounds issue and I have a better/faster culling method for AABB frustum tests I would like to try out.

Edit *

Just Noticed the sort does account for each Mesh Type, ill play around more with offset. Think i know how to fix the issue i was having.

ellioman commented 6 years ago

Hey mrochon1, Thanks for giving this a go and helping me out. It is much appreciated :)

This issue is super weird, I just tested this again and I don't have the same problem. If I change meshes then everything works fine without adding the offsets to the shader rendering the instances.

First, are you running the project on a mac or pc? I'm wondering if there's a different behaviour (and possible Unity bug) regarding this depending if we're using Metal vs. DirectX.

To explain, I am using Kostas approach explained here:

  1. https://interplayoflight.wordpress.com/2017/11/15/experiments-in-gpu-based-occlusion-culling/
  2. https://interplayoflight.wordpress.com/2018/01/15/experiments-in-gpu-based-occlusion-culling-part-2-multidrawindirect-and-mesh-lodding/

My code is doing the sorting of the position list, for each object type by distance. Then I go through the steps of finding out which ones are visible and creating a buffer with those, ordered by object type and distances.

In Kostas example he needs to send a startInstanceOffset but I never needed to on my machine. But if needed we could simply fix that by:

  1. sending the argsBuffer to the rendering shader
  2. make each materialPropertyBlock have the index to the relevant position to the buffer

Thanks again for the help. Will be fun taking this further :)

mrochon1 commented 6 years ago

Hey again,

First off sorry for the late response been swamped over the past few weeks.

Currently the test system I was using is windows based.

I'll download the latest build on my laptop tonight and confirm whether or not the problem still persists tomorrow.There's was a couple issues as well when using more then 2 objects. But basically ill take the time properly document and capture images with a 'stock' version of the 'problems' as well as the the band aids I implemented around the issues if they still persist.

I think thats actually similar to what I did, in regards to the Kostas comment. Though i have to check to be exact. I know i was adjusting the argsBuffer startInstanceOffset via the CPU which isn't obviously optimal, I don't recall much about the materialPropertyBlock probably because I didn't need to touch it much, haha. But yah, as mentioned i'll capture screens and such of the changes I made and what not to to be concise.

I'll definitely offer up as much help as I can though. Ill be more thorough anyways sometime over the next few days when i get a chance to give everything a good look, just figured Id follow up quick before bed to let you know I'm still interested in resolving this issue if it is one.. Really exciting stuff. :) Looking forward to seeing where this one goes.

Edit**

So basically the note you added seems to be correct in terms of windows / direct x 11 s case. Nothing major anyways.

https://ibb.co/nktHdU

Here's a simple screen shot. Let me know if you have any issue opening, just kind of half assed it real quick before bed. Don't mind the zoom, just wanted to make sure what I had mentioned was obvious.

I'm using a different windows based machine again. I cranked the zoom up and you should be able to see green popping through the red spheres. The Blue LODs as well flicker like my previous machine. From what i remember, the thought was on a whim that if i pushed the greens start offset the number of red units forward followed by pushing the blue the number of red and green units summed forward all meshes should be visible, and that seamed to work the first issue. I know the sort accounts for the 'mesh ID' (hope im at-least close with that label) as well as distances so I was able to get this to work again using two mesh LOD sets, but when I jumped to three I had another problem, though we'll start here and make sure this actually is an issue. I should also mention the screen grab is from a clean copy I freshly downloaded now.

Red Spheres in the bottom center have the most obnoxious 'artifacts'

I'll set up the trees again though as it's more easier to capture tomorrow. Just looks realllly obvious with minor offsets between LODs, like green and red tree in the same place. Though if im on the right page would be super easy to implement a dither.. Kind of gushing over the potential of all of this.

I'm sure that's probably apparent from my last post. Just want to be thorough. :)

mrochon1 commented 6 years ago

Alright so I took a few screen shots of the script changes I've made for anyone else who wants to fiddle as I'm sure you're already aware of remedy the issue;

I made this change to the portion of the InstanceRenderer that calls the DrawMeshIndirectFunction, like so. Might need to copy and paste manually the links to the images posted below :

Instance Renderer Changes

https://ibb.co/gxS3Np

Instance Shader Changes

https://ibb.co/e1tchp

Everything Working :

https://ibb.co/f1bxhp

Was also curious before reporting another 'issue' I noticed it will only work with a binary count in instances? I assume this is a symptom of design considering it seems a little to coincidental that, it only works when using instance counts of 1, 2, 4, and 8 that i have tried so far.

If so just wondering if it would be possible to use unused instances as 'dummies'? Either way I'm really interested in getting my culling code in now lol.

Don't mind the shapes either; Just using built in meshes for the sake of testing without sourcing a bunch of stuff.

Edit**

So im going to say the above mentioned issue with binary instance counts seems to be a bug.

Basically meshes were rendering in the proper places but were not using the proper LOD when the count of any instance wasn't binary along with the total combined instance count being binary as well.

Seems to be related to GPU sorting as turning on CPU sorting fixes the problem. I will note as well, that re-enabling GPU sorting after CPU sorting seems to alleviate the issue, so nothing super major on my end everything seems to function as intended, flagging the unused instances draw ID is working for the most part as intended.

I'll personally fiddle more tomorrow and see if I can possible figure it out. Thought it might be worth mentioning though there's something sketchy happening, hehe.

Edit** Two

Playing around more, it seem's the issue still arises after the CPU sort occurs, the LODs slowly start to 'un-sync' from the CPU result and become all jumbled again. I'll poke around more. Any reason this might be occurring?

I set it up to subtract 1000, then 7000 instances from each instance group from the random spawner code in example.

At 1000 i would see the odd blue object mixed in with red; LOD2 meshes bundled within LO0, also LOD1 seems LOD2 though was more prominent if that's even important.

The bigger the offset as well the more the mixed everything becomes. Seems camera angle as well effects the LOD mix up. At 7000 its big mess haha.

Gist is I think the padding is a wee bit broken. If i remove the bool in the GPU sort that checks for an ID match, it stops the mix up ... some what. I end up with big rings of LOD 1 inside LOD2 that expand as I point the camera down along the y axis.

So not sure where to start personally but happy to mess around more anyways.

mrochon1 commented 6 years ago

I managed to fix the next issue I was having, now I can use any number of objects with counts of any instances size and everything behaves normally. I'll cover that when I wake up. Basically rather then use an 'and' conditional check in the sort, i used the DrawCallID instead to offset the distance from the camera. So it sorts ascending first by drawcallid ordered ascending by distance from the camera as I assume was the intention, not sure if thats a direct x thing or just a minor mistake .

Kind of silly how easy the fix was versus the effort getting there...

Ill check to see if anything else comes up anyways, and ill post the fixes i implemented.

I promise they are no where near as naive as the stuff above haha. Was doing a lot of modding and experimenting on the side, and never played with the 5th arg in the buffer truthfully either. Pain in the ass to even find documentation in relation to it.. So i'll post a more 'proper' fix, just figured for anyone who was thinking it was 'broken' it definitely wasn't.. really haha. But anyways, as you had mentioned the argbuffer should be passed to the instanceshader and the instances offset into the buffer, when initiating the instances should be passed to the property blocks. Think that covers all the wacky crap I've found. :)

Not sure if windows support is something you'd be aiming at as well or your just putting a nice system out there but after I confirm a little more tomorrow that the direction I'm taking is sufficient, I'm thinking it's safe to flag this guy as solved... :)

So with that in mind, ill get back to you tomorrow.

Note ** I talk a lot. As I'm sure you've noticed lol. Just was having the same issues with any windows based machine I used and figured least i can do is relay what I find.

ellioman commented 6 years ago

Hey mrochon1, I'm sorry for never replying to your posts, got caught up in job hunting and other boring things :) It's quite awesome to see that you managed to improve the stuff I've put here.

I'm curious on how you solved the problem with any number of instances. Currently I'm padding the data so the number of instances become a power of two which is something I really need to get rid of.

Could you also show me how you modified the bitonic sorting by offsetting the camera distance?

As for Windows support, yes I plan to support that hopefully quite soon as I now have access to a windows machine at work.