Closed jemiryguo closed 1 year ago
I found that this issue was fixed in the latest version, but the latest version on Julia's official registry is still 4.0.2. Perhaps should release this feature to make it available. Thanks!
I will close this issue to avoid annoying.
TensorOperations
use the transpose-transpose-gemm-transpose (TTGT) algorithm to calculate a tensor contraction. When the thread of BLAS is set to a large number (e.g. 48), most of the time during tensor contraction is spent on transposing, as the current implementation does not utilize multithreads when transposing. After boosting transpose in TTGT with multithreads using@strided
macro inStrided.jl
which is another package by @Jutho , the total time is greatly reduced.Output is
I hope that this feature or at least a switch to turn it on can be added.