Open jeffry1829 opened 3 weeks ago
What did you compare them to, the CPU version? How large is the input tensor? For inspecting the reason, myebe the NVDIA profiler can help.
GPU Det and Directsum are ridiculously slow
Det uses cusolver ?getrf
Currently not sure whether this only happens to these two methods
Are you benchmarking against CPU version? Or old magma version?
GPU Det and Directsum are ridiculously slow
Det uses cusolver ?getrf
Currently not sure whether this only happens to these two methods