tikhonov_filter very slow on arm64 arch

bwohlberg / sporco

Sparse Optimisation Research Code

http://brendt.wohlberg.net/software/SPORCO/

BSD 3-Clause "New" or "Revised" License

258 stars 37 forks source link

tikhonov_filter very slow on arm64 arch #11

Closed IeiuniumLux closed 4 years ago

IeiuniumLux commented 4 years ago

If I run the attached test.ipynb notebook file (remove .txt) on a Google Colab system or the counterpart test.py on a local AMD/Intel system with a GeForce GTX 1660 installed, then all calls to the tikhonov_filter function complete in less than a second. However, if I run the same test.py on a Jetson TX2, then the first time the tikhonov_filter is called, it takes more than 90 seconds for the _Xfftn function function to return. Surprisingly, the subsequent calls are completed under a second. Does any know why this only happens on an arm64 architecture?

I have built pyFFTW 0.12.0 from source as well as installed via pip3, but the same result.

profile_trace.txt test.ipynb.txt test.py.txt

test_output lscpu

IeiuniumLux commented 4 years ago

As a work around, can the tikhonov_filter use the wisdom feature instead of the _Xfftn function?

It allows optimized transforms to be stored and recalled.

bwohlberg commented 4 years ago

If I run the attached test.ipynb notebook file (remove .txt) on a Google Colab system or the counterpart test.py on a local AMD/Intel system with a GeForce GTX 1660 installed, then all calls to the tikhonov_filter function complete in less than a second. However, if I run the same test.py on a Jetson TX2, then the first time the tikhonov_filter is called, it takes more than 90 seconds for the _Xfftn function function to return. Surprisingly, the subsequent calls are completed under a second. Does any know why this only happens on an arm64 architecture?

I have built pyFFTW 0.12.0 from source as well as installed via pip3, but the same result.

Presumably the long delay is due to the FFTW wisdom being computed on the first call and then the cached wisdom being used on subsequent calls. It's strange, though, that this is so much slower on the TX2. I would suggest submitting an issue with PyFFTW.

bwohlberg commented 4 years ago

As a work around, can the tikhonov_filter use the wisdom feature instead of the _Xfftn function?

It allows optimized transforms to be stored and recalled.

tikhonov_filter uses the FFT functions in sporco.linalg, which currently use the pyfftw numpy interface. I've considered replacing this interface with the more general pyfftw interface that allows access to the wisdom, but I'm afraid it's pretty far down the ToDo list at this point.

IeiuniumLux commented 4 years ago

Then, I'll close this here and open an issue with PyFFTW as suggested. Thanks @bwohlberg.