Open imtiyazuddin opened 2 months ago
is there any way to parallelize code to make it run faster?
There are a number of reasons this is slow:
sal += pr.matmul(m)
It's possible that due to the size of the matrices, iterating through the indices could cause cache misses, requiring your processors to swap out caches to memory.
The matrix itself isn't tiny. This is 1000 50 50k multiplies = 2.5 billion multiplies. This is all done sequentially and makes 2.5 billion calls to the communicator. Use matmul instead.
This simple code of matrix multiplication is taking forever to run, what could be the issue?
The dimensions are correct. but taking too much time and never finished