Open hodlen opened 10 months ago
As for now, PowerInfer uses CUDA cores for sparse operator computation, which is not efficient for prompt phase computation. In order to further support multi batch services, PowerInfer plans to use Tensor core to further optimize sparse operators.
As for now, PowerInfer uses CUDA cores for sparse operator computation, which is not efficient for prompt phase computation. In order to further support multi batch services, PowerInfer plans to use Tensor core to further optimize sparse operators.