Open rohany opened 2 years ago
CTF treats vectors no different from tensors, and the code has to account for generalized redistribution and storage format. We did use specialized code to handle vectors without overheads in a different project, and found the performance gains to be quite significant. We will integrate this into CTF soon, and tag here when it is done.
I'm running some CTF SpMV kernels using the contraction interface (
a["i"] = B["ij"] * c["j"]
), and I'm seeing performance that is nearly 2 orders of magnitude slower than systems like PETSc and Trilinos on large matrices from the suitesparse collection (such as the arabic-2005 graph). I know I've asked similar questions to this before, but I was wondering if such a discrepancy is expected, or if there is a specialized kernel available in CTF for the SpMV operation. I believe I've configured CTF correctly for my system, but I'm happy to share the configuration logs to double check.cc @solomonik