Using efficient np.linalg.multi_dot

LCAV / pyroomacoustics

Pyroomacoustics is a package for audio signal processing for indoor applications. It was developed as a fast prototyping platform for beamforming algorithms in indoor scenarios.

https://pyroomacoustics.readthedocs.io

MIT License

1.35k stars 419 forks source link

Using efficient np.linalg.multi_dot #271

Closed maldil closed 1 year ago

maldil commented 1 year ago

It is more succinct and effective to refactor code to use np.linalg.multi_dot rather than numerous np.dot routines. What do you think about this change that has practical value?

[ ] Are there docstrings ? Do they follow the [numpydoc] (https://numpydoc.readthedocs.io/en/latest/format.html#docstring-standard) style ? There are no docstrings
[ ] Have you run the tests by doing nosetests or py.test at the root of the repo ?
[ ] Have you checked that the doc builds properly and that any new file has been added to the repo ? How to do that is covered in the documentation. Yes
[ ] Is there a unit test for the proposed code modification ? If the PR addresses an issue, the test should make sure the issue is fixed. No, this is a refactoring.
[ ] Last but not least, did you document the proposed change in the CHANGELOG file ? It should go under "Unreleased". No, this is a refactoring.

Happy PR :smiley:

fakufaku commented 1 year ago

Hi @maldil , thank you so much for taking the time to open a pull request.

I didn't know about multi_dot. Thank you for pointing this out. It is new from numpy v1.19.0.

This would break compatibility for older versions of numpy, so I don't think we should use it, unless there is a large speed improvement for example. Have you measured the runtime difference ?

maldil commented 1 year ago

Hi, Thank you for the comment. In the API doc, It is recommended to use np.linalg.multi_dot instead of multiple dot due to efficiency. Following example further proved to me that it is 4.7x faster than using multiple dot s.


    from datetime import datetime

    A = np.random.random((10000, 100))
    B = np.random.random((100, 1000))
    C = np.random.random((1000, 5))
    D = np.random.random((5, 333))

    multi_dot_start = datetime.now()
    _ = np.linalg.multi_dot([A, B, C, D])
    multi_dot_end = datetime.now()
    print("multi_dot execution time",(multi_dot_end-multi_dot_start).microseconds)

    dot_start = datetime.now()
    _ = np.dot(np.dot(np.dot(A, B), C), D)
    dot_end = datetime.now()
    print("multi_dot execution time",(dot_end - dot_start).microseconds)
    np.linalg.multi_dot([A[k], SS.ppT[k], A[k].T])```

fakufaku commented 1 year ago

Right, I understand it should be faster because it will pick the optimal ordering of parenthesis. I'm more interested in the practical impact on the performance of DOA computations. If we can get 4.5x improvement overall on the DOA module that would be worth thinking about it. Do you have a performance comparison on the DOA module ?

maldil commented 1 year ago

No, I do not have performance comparison for DOA module right now.

fakufaku commented 1 year ago

I think you could quickly make one by timing the doa algorithms in the examples/doa_algorithms.py script. For runtime measurements, it is recommended to use time.perf_counter, repeat the computation you want to measure several times and take the mean or median.

maldil commented 1 year ago

I'm still looking for spare time to do this experiment. Regarding your concern about the NumPy version, it appears to have been around for a long, at least since 1.13. This is the oldest API doc I could locate. It was present in 1.13, according to that.

To be sure, I installed the oldest installable NumPy version on my Mac. I was able to use np.linalg.multi_dot as seen below.

On a side topic, if you use a certain NumPy version, I would urge that you declare it here. Because it now downloads the most recent version based on the requirement.txt. https://github.com/LCAV/pyroomacoustics/blob/b7323354b82d292d12850d59eef831fef491b25d/requirements.txt#L2

fakufaku commented 1 year ago

Hum, you are right, numpy 1.13.0 was released in June 2017. We can probably reasonably expect most people to use something more recent 😄 But then, we need to update the requirements in

requirements.txt line 2
setup.py line 183
pyproject.toml line 3

to numpy>=1.13.0.