7% of the time is spent calling the destructor for std::vector.
The get data call can not be improved directly here, but the coordinate call and destructor can be improved. Half of the call to coordinate is spent doing the modular division, and a faster way to do this would be nice.
Batch Filler Sparse Tensor 3D is showing two significant hotspots and one minor one. Testing with NEXT NEW sparse 3D simulation:
The get data call can not be improved directly here, but the coordinate call and destructor can be improved. Half of the call to coordinate is spent doing the modular division, and a faster way to do this would be nice.