Devsh-Graphics-Programming / Nabla

Vulkan, OptiX and CUDA Interoperation Modular Rendering Library and Framework for PC/Linux/Android
http://devsh.eu
Apache License 2.0
444 stars 48 forks source link

Work on property pool HLSL impl #649

Open deprilula28 opened 5 months ago

deprilula28 commented 5 months ago

Description

Implementing CPropertyPoolHandler and CPropertyPool in HLSL, using direct buffer address instead of allocating descriptors sets for buffers. Notes about impl:

-> Currently uses descritor pools (needs to allocate every time)
    -> Use BDA and root constants with the addresses instead
-> Device capabilities traits 
    -> Example version: https://github.com/Devsh-Graphics-Programming/Nabla/blob/vulkan_1_3/include/nbl/builtin/hlsl/device_capabilities_traits.hlsl
    -> maxOptimallyResidentWorkgroupInvocations
    -> Can use nbl::hlsl::jit::device_capabilities struct with JIT generated "constexpr" variables for maximally optimal workgroup invocations
    https://github.com/Devsh-Graphics-Programming/Nabla-Examples-and-Tests/blob/master/23_ArithmeticUnitTest/app_resources/shaderCommon.hlsl#L9
    https://github.com/microsoft/DirectXShaderCompiler/issues/6144

=== tasks ===

-> Port https://github.com/Devsh-Graphics-Programming/Nabla/blob/master/include/nbl/builtin/glsl/property_pool/copy.comp to HLSL
    -> Persitently resident threads, scrolling over, maximally using the GPU workgroup size
    -> Dispatch: 2D (x: DWORD in the property ID, y: property ID)
        -> property id: which buffer youre touching/analogous to draw id
            -> indexes into transferData
            -> new version: use null pointer as invalid pointer
    -> transferData: List of copy "commands"
        -> new version: Replaced by push constant with BDA address
    -> addresses: "Index buffer"
        -> invalid pointer: IOTA (analogous to not using an index buffer, use iteration index as the fetching index)
    -> Use shorts (uint16) instead of DWORDs (uint32)
        -> Transfer data struct uses bytes for future proofing
    -> Specialize on:
        -> Whether or not source is a fill
        -> Type of index (uint8, uint16, uint32, uint64)
        -> Src index is IOTA
        -> Dst index is IOTA
    -> Keep optimization for modulos (line 38 & 52)

-> CPU Code
    -> CPropertyPoolHandler
        -> Nuke m_maxPropertiesPerPass, getMaxScratchSize (not relevant with BDA version)
    -> TransferRequest on CPU keeps reference to the buffer and places it in the command buffer for lifetime tracking
        -> Have a custom command that just keeps track of a **variable number** of reference counted objects for preserving lifetimes (LinkedPreservedLifetimes?)
            -> Take a span of IGPUReferenceCounted
            -> Example: https://github.com/Devsh-Graphics-Programming/Nabla/blob/master/src/nbl/video/IGPUCommandBuffer.cpp#L104C54-L104C54
            -> For variable amount of stuff: https://github.com/Devsh-Graphics-Programming/Nabla/blob/master/src/nbl/video/IGPUCommandBuffer.cpp#L403C90-L403C90
            -> Signature example: `IGPUCommandBuffer::preserveLifetime(std::span<const core::IReferenceCounted>)`
    -> New transfer property signature
        -> make pipeline barriers more robust (or require everything to be done properly outside the function)
        -> First parameter: SIntendedSubmitInfo (IUtilities-independent submit info struct thing for handling overflows)
            -> Source for it: https://github.com/Devsh-Graphics-Programming/Nabla/blob/vulkan_1_3/include/nbl/video/utilities/SIntendedSubmitInfo.h
            -> Move IUtilities::autoSubmit and IUtilities::autoSubmitAndBlock to SIntendedSubmitInfo as static method (no more relation to IUtilities)
        -> Second parameter: struct with parameters
            `const asset::SBufferBinding<video::IGPUBuffer>& scratch, system::logger_opt_ptr logger, const size_t baseDWORD=0ull, const size_t endDWORD=~0u`
            -> Additional parameters that are optional including additional pipeline barrier values
            bitfield/boolean [pre|post]ScratchBarrier = true
    -> lets keep MaxPropertiesPerDispatch and have it equal to 64kb/sizeof(nbl::hlsl::property_pools::transferTrequest)
        -> instead of copy lambda logic at https://github.com/Devsh-Graphics-Programming/Nabla/blob/master/src/nbl/video/utilities/CPropertyPoolHandler.cpp#L172, fail if over MaxPropertiesPerDispatch
    -> leave upstreaming thing & contiguous buffers for later (#ifdef 0 it out)
        -> transferProperties with upstreaming & freeProperties
    -> IPropertyPool
        -> allocateProperties: use span instead of begin & end
            -> (behaviour) 
                -> goes through indices to find empty ones and allocate them
                -> if it's contiguous: add mapping from index to addr and addr to index
        -> nuke descriptor set stuff (line 198 -> 211)
        -> validateBlocks: change offset check (https://github.com/Devsh-Graphics-Programming/Nabla/blob/64cbb652e39acf0239a61bcee7fc26d70ab8d089/src/nbl/video/utilities/IPropertyPool.cpp#L38) to BDA
            -> check usages & non null address
    -> CPropertyPool: don't change anything, just make sure identation is right

    -> MegaDescriptorSet (Descriptor set sub-allocate)
        -> Have a multi-timeline event functor with IFuture await
        ```cpp
            MultiTimelineEventHandlerST<DeferredFreeFunctor> deferredFrees;
            deferredFrees.latch(futureWait,std::move(functor));
        -> Also have it on IPropertyPool
        -> Solve synchronization issues

-> create example testing downloads, uploads of properties
    -> with IB, without IB, fills, etc etc
    -> use regular buffer for everything
    -> later test the streaming buffers (ifdef them back in)


## Testing 
<!-- Explain how this change was tested. -->

## TODO list:
<!-- A list of things that have to be finished before this PR can be merged -->

- [ ] Verify why things aren't being written accurately
- [ ] Implement address buffer handling
- [ ] Baseline test
- [ ] Test with IOTA
- [ ] Test with fill buffers
- [ ] Test with different element sizes
- [ ] Test with different element counts 
- [ ] Test with different transfer amounts

<!--
By creating this pull request into Nabla, you agree to release all your past (even from previous commits) and present contributions in the Nabla repository under the Apache 2.0 license. If you're not the sole contributor, ensure that all contributors have signed the CLA agreeing to this.
-->