RDambrosio016 commented 2 years ago

👋 trying to use depth testing revealed a good amount of bugs which i seem to have solved, i don't think i can easily PR the bug fixes back because my own fork is wildly modified, i removed the entirety of the arc vs rc abstraction, i found it kind of useless for the sheer amount of boilerplate it adds, i also made the event loop wait for the submission command buffer before beginning recording, that is because otherwise you are unable to rely on a resource borrowed by the graph to be freed/have no other refs by the time recording begins again. This causes tons of issues for things like frame-specific buffers, i also made the event loop wait on all of the submission command buffers on resize, this is a bit of a hack ngl but its needed to ensure correctness for frame-specific buffers. This is all unrelated but might be worth considering.

Anyways, here are some bugs i found:

screen-13 totally disregards the graphic pipeline info's depth stencil mode, it is not used at all, the only considered depth stencil mode is the override with set_depth_stencil, which is really confusing. I fixed this by using .or() to consider the graphic pipeline info when creating the pipeline if depth_stencil is not set.
If a depth stencil is cleared, it must not only set stencil_op_load, it needs to also set the normal op_load to CLEAR, DONT_CARE results in undefined values which just absolutely breaks depth test. Fixed this with:

            for attachment_idx in first_exec.clears.keys().copied() {
                let attachment = &mut attachments[attachment_idx as usize];
                if matches!(depth_stencil, Some(depth_stencil_attachment_idx) if depth_stencil_attachment_idx == attachment_idx)
                {
                    // DEPTH/STENCIL
                    // Note: Layout will be set if (..when..) we're resolved or stored
                    // We don't set depth/stencil initial layout here because we don't
                    // know the view aspect flags yet - we let the store or resolve op
                    // set the initial layout
                    attachment.stencil_load_op = vk::AttachmentLoadOp::CLEAR;
                } else {
                    // COLOR
                    attachment.initial_layout = vk::ImageLayout::COLOR_ATTACHMENT_OPTIMAL;
                }
                attachment.load_op = vk::AttachmentLoadOp::CLEAR;
            }

But i think i will probably totally separate depth test and stencil test, since specifying clear and store for the stencil is useless if its not actually used.

I also find it odd that there is (to my knowledge) no possible way to clear an attachment yet not store to it, but maybe i am missing it. I do not need to store to the depth texture, it is just an intermediate texture. But store is used to specify what the actual texture for the attachment is, you can't clear and not specify a store because then it has no idea what texture to use.

A final weird bit i found is that in general screen-13 uses a depth clear of 0.0 and a test op of GREATER_OR_EQUAL, but this is not standard, everything i have ever consulted uses 1.0 and LESS_OR_EQUAL. This is not a bug but it should probably be changed to be the standard method.

attackgoat commented 2 years ago

I have not yet written any demos or examples for the depth/stencil; so these bugs are new and I agree with these findings. It will take me a few days to read your changes and the specs and see how to PR this, but I would like to work together.

With the arc/rc thing; I also totally agree. It was added so that web servers and such could do single-threaded graphics fast, but that is the one use-case. There is no "Rasberry Pi with a Vulkan GPU" which might benefit from this pattern. When it comes to changing things and moving things around I am able to do this while retaining the current arc/rc genericness although I am considering removing the pattern so that others can too. I had been hoping GATs would appear sooner and make this go away, but they're not stable yet.

Short backstory: this library partially started from a web server that rendered video, back in 2016, so the use-case of wanting minimal latency wasn't an abstract goal, and it's one that I would really enjoy continuing to support so long as it does not cause new contributors too much burden.

attackgoat commented 2 years ago

... wait for the submission command buffer before beginning recording, that is because otherwise you are unable to rely on a resource borrowed by the graph ...

This sounds like a bug - the resource should be ready to use on another graph right away. The second graph is supposed to notice the binding state and place a barrier to wait for that work before executing.

RDambrosio016 commented 2 years ago

Think i worded that wrong, i meant that if you give out, say, an arc'd binding, the deferred drop wont be done by the time the frame loops back around, so the strong count won't be 0, that caused some issues

attackgoat commented 2 years ago

I understand, this is expected. For per-frame resources right now the pattern that works is to lease an item from a HashPool cache; directly after leasing it you know that it has no strong references. If you instead directly create, say, a buffer and wrap it into a binding you can only be sure it has a zero strong count before binding to the first graph. If these patterns are failing it would be a bug or bad API.

RDambrosio016 commented 2 years ago

If these patterns are failing it would be a bug or bad API.

Not quite sure if u mean a bug in screen-13 or the app using screen-13. The big issue i've run into is that basically all renderers at some point will require the render graph to just trust that the app is writing to mapped buffers correctly. I am not quite sure how to implement dynamic constants (see kajiya-backend for what i mean), an ideal implementation would keep a single buffer that is mapped at all times, that is the fastest way other than push constants (but those are limited obviously). But this does not work with screen-13 because you cannot bind a buffer AND write to it, other than i guess using cmd_update_buffer but that is kind of weird for mappable memory. I guess eventually i will have to implement something like RawBufferBinding that just takes a handle and arbitrarily binds it, relying on the user to not do weird things like override data that is being read.

RDambrosio016 commented 2 years ago

@attackgoat Also regarding rc vs arc, i think you could keep the abstraction, but make it a compile-time feature thing. I.e. use Arc by default and have an rc feature then just make a Shared<T> type or something which is arc by default or rc. This would still allow specific projects to use Rc, but it would get rid of the annoying P generics which are super painful.

attackgoat commented 2 years ago

Not quite sure if u mean a bug in screen-13 or the app using screen-13.

A bug in Screen 13 or the API it provides, not any external code. This code is new and as you have found not fully baked. I am so happy others are looking at this and helping! 🥹

Kajiya's dynamic constants could be written in Screen 13 as you mention if we implement the mapped_ptr function.

Update: The missing functions have been added

RDambrosio016 commented 2 years ago

Ah ok, i understand.

I do think that is kind of the wrong approach however, i think the better approach is adding a RawBufferNode which just takes a device address and info. This way its possible to just bind any buffer, internal or external and just ask screen-13 to trust that the buffer is valid and will be valid for the duration of the frame(s). This is more flexible and honestly safer than adding a mapped_ptr function (although that function should be added either ways)

attackgoat commented 2 years ago

I like the idea but the amount of duplicated boilerplate code wrt nodes/bindings (ugh, this is my fault!) make that challenging considering these buffers will behave the same as "normal" ones except for dropping.

Could we instead add new API to Buffer so that you may create a new instance given your own vk::Buffer handle, and we'll promise not to drop it on you? This is what swapchain images do; when in "swapchain mode" they don't call destroy.

RDambrosio016 commented 2 years ago

Yeah that could probably work, though it might be ambiguous to tell whether a buffer is external or not

attackgoat commented 2 years ago

@RDambrosio016 I think #24 is going to make your code mergable - Do you have any depth things you want to PR?

attackgoat commented 2 years ago

25 Should also improve some of the things I picked up from this thread.

attackgoat / screen-13

Bugs found when taking depth test for a test drive #17

25 Should also improve some of the things I picked up from this thread.