-
Some backends, e.g. `SYCL` don't allow querying the amount of free device space. Nevertheless, `KokkosSPGEMM::compute_num_pool_chunks`, https://github.com/kokkos/kokkos-kernels/blob/0a1740890e84a145d6…
-
Hello,
I have been testing ParILUT with GMRES with linear systems extracted from my application.
I am seeing NaN residuals with the CUDA exec on H100. The versions of the exact same code but wit…
-
When I want to change the first value. When choosing to OpenMP.
And I set the number of threads to be greater than 1 in intel CPU。
like this:omp_set_num_threads(16);
The program will take a long t…
-
I observe that libxsmm_fsspmdm_create is giving a segfault when ldb and ldc are large. The cutoff ldb/ldc value for segfault seems to vary a bit with the size of the A matrix.
I managed to recreate…
-
Consider the following script:
```python
A = torch.sparse_coo_tensor(
indices=[
[1500, 1505, 1506],
[8347, 8347, 8347],
],
values = [1., 1., 1.],
size = [2523, 13716],
dev…
-
@srajama1 @ndellingwood @brian-kelley @vqd8a
It seems that we might want to add new algorithms that were developed recently to that header:
- KokkosSparse_spadd.hpp
- KokkosSparse_spiluk.hpp
- K…
lucbv updated
4 years ago
-
With the fixes, kokkos PRs 4014 and 4029, kokkos-kernels PR #958, and and trilinos PR 9123, this is the last remaining issue for building the new trilinos stack on Windows-LLVM without CUDA.
In fil…
-
This issue is in reference to the discussion regarding having sequential operations run on the host rather than on the device kernels (reference, openmp, cuda etc).
I would propose for having a cl…
-
Just notice this nice community effort on GraphBLAS-based algorithms.
I am curious if there are any attempts & interests on translating a complete [AMG solver](https://en.wikipedia.org/wiki/Multigr…
-
As shown in paper, CUTLASS library is used for speedup. But I did not find codes rely on these settlement.How should I verify SparseGPT is faster than dense models when doing inference? Even with end-…