Closed johnryan465 closed 1 year ago
By making the shared memory D times larger, we can reduce the amount of collisions when performing atomic adds reducing the overall time. Current benchmark time: 151 seconds on 3090
By making the shared memory D times larger, we can reduce the amount of collisions when performing atomic adds reducing the overall time. Current benchmark time: 151 seconds on 3090