BruOp / zec

4 stars 0 forks source link

Possible to eliminate gather in ISPC culling. #3

Open pbrubaker opened 9 months ago

pbrubaker commented 9 months ago

I know it's been a while since this repo has been active, but I stumbled upon your work while developing an ISPC training class for my team internally at Intel. One of the exercises in the homework is going to be frustum culling.

I wanted to point out that it's possible to eliminate the gather in the foreach() loop by switching to uniform for() loops and using the aos_to_soa4() standard library function to load and shuffle instead of calling vgatherdps. It is also possible to create your own transpose functions by using static varying permutation values, and using the shuffle() standard library function.

I also added inline to the other functions in the file. This should also make it a bit faster. Note, that I haven't tested this code, I did this quickly to illustrate that you don't need to use foreach(), and some times (like needing a uniform base address for aos_to_soa4() here) it's more performant to use a uniform for loop than it is to use foreach().

Here are the changes to the code on Compiler Explorer. If you have any questions please let me know. Wish I had time to fully test this - but I'm looking forward to seeing what difference this makes.

https://ispc.godbolt.org/z/zjsKPaGME

(also, thanks for the plug to my article on in the Intel developer site!)

BruOp commented 7 months ago

Hey thanks for reaching out and letting me know! The timing is fortuitous as I've started working on this repo again in the last few months, but I totally missed this comment somehow. I'll take a look very soon and get back to you with the results 😄

pbrubaker commented 7 months ago

Awesome! Let me know if I can help more.