KOKIAOKI / 3d_bbs

MIT License
172 stars 28 forks source link

Fix: Simplify and improve the speed of score calculation using one kernel #5

Closed KOKIAOKI closed 9 months ago

KOKIAOKI commented 9 months ago

I replaced score calculation by cuda graph with one kernel execution. The updated score calculation is now twice as fast as previous one. Additionally, this PR addresses a bug that occurred when the souce points size exeeded approximately 5000 or when VGF was disabled.

(Block size was fixed after this PR.)