Open pbrubaker opened 9 months ago
Hey thanks for reaching out and letting me know! The timing is fortuitous as I've started working on this repo again in the last few months, but I totally missed this comment somehow. I'll take a look very soon and get back to you with the results 😄
Awesome! Let me know if I can help more.
I know it's been a while since this repo has been active, but I stumbled upon your work while developing an ISPC training class for my team internally at Intel. One of the exercises in the homework is going to be frustum culling.
I wanted to point out that it's possible to eliminate the gather in the
foreach()
loop by switching touniform
for()
loops and using theaos_to_soa4()
standard library function to load and shuffle instead of callingvgatherdps.
It is also possible to create your own transpose functions by using static varying permutation values, and using theshuffle()
standard library function.I also added inline to the other functions in the file. This should also make it a bit faster. Note, that I haven't tested this code, I did this quickly to illustrate that you don't need to use
foreach()
, and some times (like needing a uniform base address foraos_to_soa4()
here) it's more performant to use a uniform for loop than it is to useforeach()
.Here are the changes to the code on Compiler Explorer. If you have any questions please let me know. Wish I had time to fully test this - but I'm looking forward to seeing what difference this makes.
https://ispc.godbolt.org/z/zjsKPaGME
(also, thanks for the plug to my article on in the Intel developer site!)