Open Dawoodoz opened 4 years ago
Using alloca instead of VLA would make the code more standard, while still getting the speed of stack memory.
Maybe just restructure rasterization, so that the pixel intervals are sent directly to the pixel shader using a function pointer for filling two rows of pixels. Then no need for VLA when rendering triangles.
Triangle rasterization uses small but dynamic arrays for storing pixel intervals for each row without having to fetch memory far away on the heap.
In case that the VLA C extension can suddenly no longer be used in the distant future (new CPU architecture with new conflicting feature, et cetera), it would be good to have a fallback implementation for simulating or replacing VLA when not available (just like the SIMD abstraction runs with zero overhead when not having the extensions).
A global stack on the heap would not work when called from multiple threads breaking the call order.
Carrying thread contexts would be a horribly entangled spaghetti design.
Allocating on the heap per triangle would be compact, but also horribly slow if ending up with cache misses from another thread stealing the address space. Pre-allocating the height of the target's section with even padding would have enough room for the worst case triangle height and have no allocation overhead per triangle, but this would not be easily reusable for other problems needing VLA.