RubixML / Tensor

A library and extension that provides objects for scientific computing in PHP.
https://rubixml.com
MIT License
223 stars 27 forks source link

Matrix multiplication by Strassen algorithm #4

Closed tuqqu closed 3 years ago

tuqqu commented 3 years ago

Hello. I'd like to propose adding an alternative to the standard matrix multiplication, the multiplication using Strassen algorithm, which asymptotic complexity is lower, O(n^2.81) vs ϴ(n^3)

github-actions[bot] commented 3 years ago

CLA Assistant Lite bot:
Thank you for your submission, we really appreciate it. Like many open-source projects, we ask that you sign our [Contributor License Agreement]() before we can accept your contribution. You can sign the CLA by just posting a Pull Request Comment same as the below format.


I have read the CLA Document and I hereby sign the CLA


You can retrigger this bot by commenting recheck in this Pull Request

andrewdalpino commented 3 years ago

Nice work @tuqqu! How did the benchmarks compare vs the moderately cache-optimized O(n^3) algorithm? I've heard that Strassen is only faster for very large matrices (due to cache obliviousness). I love that you've provided Strassen as an alternative instead of a replacement, perhaps we can doubly dispatch to Strassen from matmul() if the matrix is above a certain size once we find the inflection point.

andrewdalpino commented 3 years ago

Did you need help @tuqqu?

tuqqu commented 3 years ago

@andrewdalpino hello! thanks for reviewing this PR. I did see your comments and invested some time in refining the PR, but I stumbled upon the issue that no matter how I try to optimise the algorithm, it still looses to the standard one in terms of time. In theory it ought not to be loosing, but it nonetheless is. It is mostly because of the steps we have to perform before we make computations, i.e. expanding original matrix with zeros, creating multiple new matrices, etc. I have a feeling that PHP's arrays implementation is the main issue here which affects performance.

I will surely dedicate more time in investigating this, but I cannot say when exactly.

andrewdalpino commented 3 years ago

Hey @tuqqu, I'm 99% certain it's due to the cache oblivious nature of Strassen (this is well documented see the Wikipedia article below). The reason why you usually do not see Strassen used in practice is because cache misses tend to dominate the runtime.

https://en.wikipedia.org/wiki/Strassen_algorithm#Cache_behavior

I believe we would encounter this in any language where the entire matrices cannot fit into the CPU cache. A brief look at the code shows cases of non-linear array traversal which is most likely the culprit. I go into some details about cache-aware array traversal in this video.

https://youtu.be/5YtNugpMhyQ?t=572

How much slower are we talking? How are they compared on memory?

andrewdalpino commented 3 years ago

Any updates @tuqqu? I don't feel like we should give up just yet ... there may exist some situations where Strassen will be faster

Let me know if you needed any help

tuqqu commented 3 years ago

@andrewdalpino I experimented with the optimisations you had suggested, sadly, to no avail. It gained some speed for sure, but the overall performance is very poor.

We are talking about roughly 2.3 secs for 100x100 matrices vs 0.23 secs. The bigger the matrices get the overwhelmingly poorer performance becomes, which is not the case for the standard computational method. I have read about cache-obliviousness, this definitely adds complexity, but I still believe that PHP-arrays are the issue. The mere fact that we are creating a lot of additional arrays for the Strassen algorithm makes the most impact.

I am afraid I have to suggest closing this MR as it is not practical.

andrewdalpino commented 3 years ago

Damn well that's ok, this experiment provides lot's of good insight and will surely guide our path forward as we make future optimizations. Thank you for your contribution!

And if you're not already, join us in our Telegram channel https://t.me/RubixML