SHTOOLS / SHTOOLS

SHTOOLS - Spherical Harmonic Tools
https://shtools.github.io/SHTOOLS/
BSD 3-Clause "New" or "Revised" License
363 stars 106 forks source link

Allow usage of other spherical harmonic transform backends #258

Closed MarkWieczorek closed 3 years ago

MarkWieczorek commented 4 years ago

Context

pyshtools makes use of the the python wrapped Fortran 95 SHTOOLS routines. These routines are well tested and relatively fast. However, C-based routines can be faster by a factor of two or so, and optimization of the existing routines likely can only increase efficiency by several 10s of percent. It is unlikely that another C-based package will come along to replace all the Fortran 95 routines, but it would be beneficial to give the option of using faster routines for some basic operations such as basic spherical harmonic transforms and expansions.

Proposal

Methods such as .expand() should accept an optional argument backend that will identify the software package used to perform the spherical harmonic transform. The default value, until a more suitable backend is found, will be 'shtools' for the Fortran 95 SHTOOLS archive.

Requirements

mreineck commented 3 years ago

I can offer a prototype of a wrapper for the DUCC SHT backend (https://gitlab.mpcdf.mpg.de/mtr/ducc). So far I have only tested a small part of the functionality (rotation of spherical harmonic coefficients and DH grid synthesis/analysis), but they seem to work quite well. [Edit: I mean to say that I only tried calling these functions from pyshtools ... the SHT functionality in DUCC is pretty thoroughly tested ;-)]

Pros

Cons

If there is interest in such a wrapper, I can provide a few demos and a rough prototype later today, which covers a small part of the available calls. Most likely I won't have time to implement the full wrapper though. I expect it won't be difficult, but a lot of tedious, repetitive work.

What do you think?

mreineck commented 3 years ago

Here are a few timings, made with https://github.com/mreineck/SHTOOLS/tree/ducc_wrapper_demo

Single threaded:

martin@marvin:~/codes/SHTOOLS$ python3 examples/python/SHRotations/SHRotations_ducc.py 
nthreads= 1
lmax:  1000
/home/martin/codes/SHTOOLS/examples/python/SHRotations/SHRotations_ducc.py:28: DeprecationWarning: `np.bool` is a deprecated alias for the builtin `bool`. To silence this warning, use `bool` by itself. Doing this will not modify any behavior and is safe. If you specifically wanted the numpy scalar type, use `np.bool_` here.
Deprecated in NumPy 1.20; for more details and guidance: https://numpy.org/devdocs/release/1.20.0-notes.html#deprecations
  mask = np.zeros((2, lmax + 1, lmax + 1), dtype=np.bool)
computing rotation matrix for Euler angles: (10.00,20.00,30.00)
Time for SHTOOLS rotation:  3.7298920154571533
Time for DUCC rotation:  0.27285337448120117
Maximum difference in rotated coefficients:  9.805510570171094e-11
martin@marvin:~/codes/SHTOOLS$ python3 examples/python/TimingAccuracy/TimingAccuracyDH_ducc.py 
nthreads= 1
Driscoll-Healy (real) sampling = 2
/home/martin/codes/SHTOOLS/examples/python/TimingAccuracy/TimingAccuracyDH_ducc.py:31: DeprecationWarning: `np.bool` is a deprecated alias for the builtin `bool`. To silence this warning, use `bool` by itself. Doing this will not modify any behavior and is safe. If you specifically wanted the numpy scalar type, use `np.bool_` here.
Deprecated in NumPy 1.20; for more details and guidance: https://numpy.org/devdocs/release/1.20.0-notes.html#deprecations
  mask = np.zeros((2, maxdeg + 1, maxdeg + 1), dtype=np.bool)
creating 8396802 random coefficients
code     lmax    maxerror    rms         tinverse    tforward
SHTOOLS:    1    8.14e-16    1.92e-31    3.6e-04s    2.7e-05s
ducc0:      1    3.26e-16    6.86e-32    1.8e-04s    6.7e-05s
SHTOOLS:    2    2.34e-15    8.95e-31    2.5e-05s    2.0e-05s
ducc0:      2    3.08e-15    1.15e-30    5.7e-05s    5.8e-05s
SHTOOLS:    4    7.67e-15    4.71e-30    1.3e-02s    1.4e-02s
ducc0:      4    6.68e-14    1.89e-28    1.6e-04s    1.9e-04s
SHTOOLS:    8    1.70e-13    3.81e-28    3.7e-02s    3.4e-02s
ducc0:      8    7.55e-13    7.12e-27    3.1e-04s    2.1e-04s
SHTOOLS:   16    8.49e-12    2.62e-25    6.4e-03s    5.1e-03s
ducc0:     16    3.39e-12    5.09e-26    2.5e-04s    2.7e-04s
SHTOOLS:   32    1.65e-12    7.05e-27    3.2e-02s    2.7e-02s
ducc0:     32    1.08e-12    3.64e-27    4.3e-04s    4.5e-04s
SHTOOLS:   64    3.54e-11    7.42e-25    6.1e-02s    5.7e-02s
ducc0:     64    8.24e-11    2.55e-24    9.5e-04s    9.6e-04s
SHTOOLS:  128    1.99e-10    4.61e-24    1.1e-01s    1.1e-01s
ducc0:    128    4.12e-10    1.31e-23    2.4e-03s    2.4e-03s
SHTOOLS:  256    3.81e-09    2.31e-22    6.1e-01s    5.0e-01s
ducc0:    256    4.16e-09    2.79e-22    1.7e-02s    1.7e-02s
SHTOOLS:  512    5.26e-08    1.16e-20    3.1e-01s    3.2e-01s
ducc0:    512    6.21e-08    1.71e-20    2.6e-02s    2.6e-02s
SHTOOLS: 1024    1.36e-07    1.99e-20    2.1e+00s    2.0e+00s
ducc0:   1024    1.17e-07    2.05e-20    1.3e-01s    1.2e-01s
SHTOOLS: 2048    3.25e-06    2.65e-18    1.3e+01s    1.3e+01s
ducc0:   2048    2.03e-06    1.43e-18    1.5e+00s    1.5e+00s

Eight threads:

martin@marvin:~/codes/SHTOOLS$ python3 examples/python/SHRotations/SHRotations_ducc.py 
nthreads= 8
lmax:  1000
/home/martin/codes/SHTOOLS/examples/python/SHRotations/SHRotations_ducc.py:28: DeprecationWarning: `np.bool` is a deprecated alias for the builtin `bool`. To silence this warning, use `bool` by itself. Doing this will not modify any behavior and is safe. If you specifically wanted the numpy scalar type, use `np.bool_` here.
Deprecated in NumPy 1.20; for more details and guidance: https://numpy.org/devdocs/release/1.20.0-notes.html#deprecations
  mask = np.zeros((2, lmax + 1, lmax + 1), dtype=np.bool)
computing rotation matrix for Euler angles: (10.00,20.00,30.00)
Time for SHTOOLS rotation:  3.7409327030181885
Time for DUCC rotation:  0.0915982723236084
Maximum difference in rotated coefficients:  1.548456918243346e-10
martin@marvin:~/codes/SHTOOLS$ python3 examples/python/TimingAccuracy/TimingAccuracyDH_ducc.py 
nthreads= 8
Driscoll-Healy (real) sampling = 2
/home/martin/codes/SHTOOLS/examples/python/TimingAccuracy/TimingAccuracyDH_ducc.py:31: DeprecationWarning: `np.bool` is a deprecated alias for the builtin `bool`. To silence this warning, use `bool` by itself. Doing this will not modify any behavior and is safe. If you specifically wanted the numpy scalar type, use `np.bool_` here.
Deprecated in NumPy 1.20; for more details and guidance: https://numpy.org/devdocs/release/1.20.0-notes.html#deprecations
  mask = np.zeros((2, maxdeg + 1, maxdeg + 1), dtype=np.bool)
creating 8396802 random coefficients
code     lmax    maxerror    rms         tinverse    tforward
SHTOOLS:    1    8.14e-16    1.92e-31    4.7e-04s    7.2e-05s
ducc0:      1    3.26e-16    6.86e-32    9.2e-04s    4.3e-04s
SHTOOLS:    2    2.34e-15    8.95e-31    9.2e-05s    6.2e-05s
ducc0:      2    3.08e-15    1.15e-30    4.5e-04s    4.2e-04s
SHTOOLS:    4    7.67e-15    4.71e-30    1.3e-02s    1.4e-02s
ducc0:      4    6.68e-14    1.89e-28    3.9e-04s    3.6e-04s
SHTOOLS:    8    5.52e-13    3.78e-27    3.8e-02s    3.6e-02s
ducc0:      8    7.55e-13    7.12e-27    6.0e-04s    6.4e-04s
SHTOOLS:   16    8.49e-12    2.62e-25    6.5e-03s    5.4e-03s
ducc0:     16    3.39e-12    5.09e-26    5.9e-04s    5.4e-04s
SHTOOLS:   32    1.65e-12    7.05e-27    3.2e-02s    2.8e-02s
ducc0:     32    1.08e-12    3.64e-27    5.8e-04s    6.5e-04s
SHTOOLS:   64    3.54e-11    7.42e-25    6.2e-02s    5.9e-02s
ducc0:     64    8.24e-11    2.55e-24    1.0e-03s    1.2e-03s
SHTOOLS:  128    1.99e-10    4.61e-24    1.1e-01s    1.1e-01s
ducc0:    128    4.12e-10    1.31e-23    2.0e-03s    1.9e-03s
SHTOOLS:  256    3.81e-09    2.31e-22    6.1e-01s    5.0e-01s
ducc0:    256    4.16e-09    2.79e-22    1.0e-02s    1.4e-02s
SHTOOLS:  512    5.26e-08    1.16e-20    3.1e-01s    3.0e-01s
ducc0:    512    6.21e-08    1.71e-20    3.7e-02s    2.7e-02s
SHTOOLS: 1024    1.36e-07    1.99e-20    2.1e+00s    2.0e+00s
ducc0:   1024    1.17e-07    2.05e-20    1.1e-01s    6.8e-02s
SHTOOLS: 2048    3.25e-06    2.65e-18    1.3e+01s    1.3e+01s
ducc0:   2048    2.03e-06    1.43e-18    4.2e-01s    3.5e-01s
mreineck commented 3 years ago

I'm now supporting conversion of Condon-Shortley phase and normalizations 1, 2, and 4. Not sure whether I can support unnormalized coefficients (norm=3); the conversion factors become extremely crazy.

Concerning the interface, I won't be able to copy the interface of SHRotateRealCoef, since ducc0 does not need a precomputed Wigner matrix, and we definitely should not compute it if it is not required.

MarkWieczorek commented 3 years ago

This looks great! (Though I didn't have time to read through all the code). Here are just a few quick comments:

  1. It is not too important to treat the unnormalized case, as this only works for very low degrees. I would probably only allow this option when L is less than some predefined value that gives good accuracy. I would need to look through my code, but I seem to recall L=90 was about the highest I could go...
  2. For the creation of grids, we will need to take into account the option of sampling in longitude (sampling=1 and 2), as well as the option extend that allows you to optionally compute the redundant 360 longitude, and 90 S. This is built into alot of the routines in SHGrid, so if this isn't done, it will cause problems upstream...
  3. Are there routines for complex data and complex coefficients? If not, we could just add an error if you try to use this backend with these data types. We don't need to implement this immediately.
  4. Is there a routine for computing the gradient? Again, we don't need to implement this immediately.
  5. Would it be better to put the wrapper functions in the ducc code base, instead of pyshtools?
  6. And lastly, is it possible to install ducc with conda?

I would be happy to help add documentation to the docstrings for the routines that have ducc support. It looks like you didn't modify the SHGrid and SHCoeffs classes, and I could probably help with this as well. Also, I will probably need to write/re-write the implementation webpage describing the different backends and timing tests.

Just let me know what I can do to help.

mreineck commented 3 years ago
  1. It is not too important to treat the unnormalized case, as this only works for very low degrees. I would probably only allow this option when L is less than some predefined value that gives good accuracy. I would need to look through my code, but I seem to recall L=90 was about the highest I could go...

I think I found a way to support this in the meantime. It is pretty slow, but I don't expect that anyone is using this convention for performance-sensitive work, so that should be fine. Limiting this to some fixed L is probably a good precaution.

  1. For the creation of grids, we will need to take into account the option of sampling in longitude (sampling=1 and 2), as well as the option extend that allows you to optionally compute the redundant 360 longitude, and 90 S. This is built into alot of the routines in SHGrid, so if this isn't done, it will cause problems upstream...

sampling=1/2 is already supported. "extend" is not yet in, but should not be too much work.

  1. Are there routines for complex data and complex coefficients? If not, we could just add an error if you try to use this backend with these data types. We don't need to implement this immediately.

There is no native support in ducc for this, but it can be emulated by processing real and imaginary parts separately, I think.

  1. Is there a routine for computing the gradient? Again, we don't need to implement this immediately.

I think I have something pretty similar in the C++ implementation, but it does not have a Python interface yet. This may have to wait a bit.

  1. Would it be better to put the wrapper functions in the ducc code base, instead of pyshtools?

I would argue that the wrapper provides potential benefits to all pyshtools users, but only to those ducc users which are also using pyshtools, so I'd argue for inclusion in pyshtools. But there is also the option to make this a completely standalone code base. Concerning the license, I'm fine with releasing the wrapper code under the same license as shtools.

  1. And lastly, is it possible to install ducc with conda?

As far as I understand, any PyPI-hosted package can be installed via conda (see, e.g., https://stackoverflow.com/questions/29286624/how-to-install-pypi-packages-using-anaconda-conda-command), but I'm definitely not an expert on this.

I would be happy to help add documentation to the docstrings for the routines that have ducc support. It looks like you didn't modify the SHGrid and SHCoeffs classes,

Correct, because I'm not sure where and how exactly you plan to add the checks

if <backend-xyz.available>:
    call backend-xyz.func()
else:
    call shtools.func()

You will know best where this kind of code should go. Once this is in place, I could fill in my calls; so if you have some time, adding this functionality would be great!

Also, I will probably need to write/re-write the implementation webpage describing the different backends and timing tests.

OK, but this is probably still a bit in the future :-)

mreineck commented 3 years ago

Please also note that my current glue code is really ugly and needs a lot of tweaking and formatting before it can be seriously considered for inclusion. For the moment, this is meant as a proof of concept only.