Closed Bihaqo closed 6 years ago
T3f on CPU scales worse than lineary with e.g. multipliying a batch of TT-matrices by a TT-matrix. This is weird.
Also, matrix-by-matrix multiplication is slower than that of TTPY.
See profile_cpu_vs_gpu branch
The reason turned out to be slow tf.transpose for tensors of more than 5 dims. To mitigate, apply the following patch and compile TensorFlow from sources.
T3f on CPU scales worse than lineary with e.g. multipliying a batch of TT-matrices by a TT-matrix. This is weird.
Also, matrix-by-matrix multiplication is slower than that of TTPY.
See profile_cpu_vs_gpu branch