Closed MarkWieczorek closed 3 years ago
I can offer a prototype of a wrapper for the DUCC SHT backend (https://gitlab.mpcdf.mpg.de/mtr/ducc). So far I have only tested a small part of the functionality (rotation of spherical harmonic coefficients and DH grid synthesis/analysis), but they seem to work quite well.
[Edit: I mean to say that I only tried calling these functions from pyshtools
... the SHT functionality in DUCC is pretty thoroughly tested ;-)]
Pros
pybind11
.Cons
If there is interest in such a wrapper, I can provide a few demos and a rough prototype later today, which covers a small part of the available calls. Most likely I won't have time to implement the full wrapper though. I expect it won't be difficult, but a lot of tedious, repetitive work.
What do you think?
Here are a few timings, made with https://github.com/mreineck/SHTOOLS/tree/ducc_wrapper_demo
Single threaded:
martin@marvin:~/codes/SHTOOLS$ python3 examples/python/SHRotations/SHRotations_ducc.py
nthreads= 1
lmax: 1000
/home/martin/codes/SHTOOLS/examples/python/SHRotations/SHRotations_ducc.py:28: DeprecationWarning: `np.bool` is a deprecated alias for the builtin `bool`. To silence this warning, use `bool` by itself. Doing this will not modify any behavior and is safe. If you specifically wanted the numpy scalar type, use `np.bool_` here.
Deprecated in NumPy 1.20; for more details and guidance: https://numpy.org/devdocs/release/1.20.0-notes.html#deprecations
mask = np.zeros((2, lmax + 1, lmax + 1), dtype=np.bool)
computing rotation matrix for Euler angles: (10.00,20.00,30.00)
Time for SHTOOLS rotation: 3.7298920154571533
Time for DUCC rotation: 0.27285337448120117
Maximum difference in rotated coefficients: 9.805510570171094e-11
martin@marvin:~/codes/SHTOOLS$ python3 examples/python/TimingAccuracy/TimingAccuracyDH_ducc.py
nthreads= 1
Driscoll-Healy (real) sampling = 2
/home/martin/codes/SHTOOLS/examples/python/TimingAccuracy/TimingAccuracyDH_ducc.py:31: DeprecationWarning: `np.bool` is a deprecated alias for the builtin `bool`. To silence this warning, use `bool` by itself. Doing this will not modify any behavior and is safe. If you specifically wanted the numpy scalar type, use `np.bool_` here.
Deprecated in NumPy 1.20; for more details and guidance: https://numpy.org/devdocs/release/1.20.0-notes.html#deprecations
mask = np.zeros((2, maxdeg + 1, maxdeg + 1), dtype=np.bool)
creating 8396802 random coefficients
code lmax maxerror rms tinverse tforward
SHTOOLS: 1 8.14e-16 1.92e-31 3.6e-04s 2.7e-05s
ducc0: 1 3.26e-16 6.86e-32 1.8e-04s 6.7e-05s
SHTOOLS: 2 2.34e-15 8.95e-31 2.5e-05s 2.0e-05s
ducc0: 2 3.08e-15 1.15e-30 5.7e-05s 5.8e-05s
SHTOOLS: 4 7.67e-15 4.71e-30 1.3e-02s 1.4e-02s
ducc0: 4 6.68e-14 1.89e-28 1.6e-04s 1.9e-04s
SHTOOLS: 8 1.70e-13 3.81e-28 3.7e-02s 3.4e-02s
ducc0: 8 7.55e-13 7.12e-27 3.1e-04s 2.1e-04s
SHTOOLS: 16 8.49e-12 2.62e-25 6.4e-03s 5.1e-03s
ducc0: 16 3.39e-12 5.09e-26 2.5e-04s 2.7e-04s
SHTOOLS: 32 1.65e-12 7.05e-27 3.2e-02s 2.7e-02s
ducc0: 32 1.08e-12 3.64e-27 4.3e-04s 4.5e-04s
SHTOOLS: 64 3.54e-11 7.42e-25 6.1e-02s 5.7e-02s
ducc0: 64 8.24e-11 2.55e-24 9.5e-04s 9.6e-04s
SHTOOLS: 128 1.99e-10 4.61e-24 1.1e-01s 1.1e-01s
ducc0: 128 4.12e-10 1.31e-23 2.4e-03s 2.4e-03s
SHTOOLS: 256 3.81e-09 2.31e-22 6.1e-01s 5.0e-01s
ducc0: 256 4.16e-09 2.79e-22 1.7e-02s 1.7e-02s
SHTOOLS: 512 5.26e-08 1.16e-20 3.1e-01s 3.2e-01s
ducc0: 512 6.21e-08 1.71e-20 2.6e-02s 2.6e-02s
SHTOOLS: 1024 1.36e-07 1.99e-20 2.1e+00s 2.0e+00s
ducc0: 1024 1.17e-07 2.05e-20 1.3e-01s 1.2e-01s
SHTOOLS: 2048 3.25e-06 2.65e-18 1.3e+01s 1.3e+01s
ducc0: 2048 2.03e-06 1.43e-18 1.5e+00s 1.5e+00s
Eight threads:
martin@marvin:~/codes/SHTOOLS$ python3 examples/python/SHRotations/SHRotations_ducc.py
nthreads= 8
lmax: 1000
/home/martin/codes/SHTOOLS/examples/python/SHRotations/SHRotations_ducc.py:28: DeprecationWarning: `np.bool` is a deprecated alias for the builtin `bool`. To silence this warning, use `bool` by itself. Doing this will not modify any behavior and is safe. If you specifically wanted the numpy scalar type, use `np.bool_` here.
Deprecated in NumPy 1.20; for more details and guidance: https://numpy.org/devdocs/release/1.20.0-notes.html#deprecations
mask = np.zeros((2, lmax + 1, lmax + 1), dtype=np.bool)
computing rotation matrix for Euler angles: (10.00,20.00,30.00)
Time for SHTOOLS rotation: 3.7409327030181885
Time for DUCC rotation: 0.0915982723236084
Maximum difference in rotated coefficients: 1.548456918243346e-10
martin@marvin:~/codes/SHTOOLS$ python3 examples/python/TimingAccuracy/TimingAccuracyDH_ducc.py
nthreads= 8
Driscoll-Healy (real) sampling = 2
/home/martin/codes/SHTOOLS/examples/python/TimingAccuracy/TimingAccuracyDH_ducc.py:31: DeprecationWarning: `np.bool` is a deprecated alias for the builtin `bool`. To silence this warning, use `bool` by itself. Doing this will not modify any behavior and is safe. If you specifically wanted the numpy scalar type, use `np.bool_` here.
Deprecated in NumPy 1.20; for more details and guidance: https://numpy.org/devdocs/release/1.20.0-notes.html#deprecations
mask = np.zeros((2, maxdeg + 1, maxdeg + 1), dtype=np.bool)
creating 8396802 random coefficients
code lmax maxerror rms tinverse tforward
SHTOOLS: 1 8.14e-16 1.92e-31 4.7e-04s 7.2e-05s
ducc0: 1 3.26e-16 6.86e-32 9.2e-04s 4.3e-04s
SHTOOLS: 2 2.34e-15 8.95e-31 9.2e-05s 6.2e-05s
ducc0: 2 3.08e-15 1.15e-30 4.5e-04s 4.2e-04s
SHTOOLS: 4 7.67e-15 4.71e-30 1.3e-02s 1.4e-02s
ducc0: 4 6.68e-14 1.89e-28 3.9e-04s 3.6e-04s
SHTOOLS: 8 5.52e-13 3.78e-27 3.8e-02s 3.6e-02s
ducc0: 8 7.55e-13 7.12e-27 6.0e-04s 6.4e-04s
SHTOOLS: 16 8.49e-12 2.62e-25 6.5e-03s 5.4e-03s
ducc0: 16 3.39e-12 5.09e-26 5.9e-04s 5.4e-04s
SHTOOLS: 32 1.65e-12 7.05e-27 3.2e-02s 2.8e-02s
ducc0: 32 1.08e-12 3.64e-27 5.8e-04s 6.5e-04s
SHTOOLS: 64 3.54e-11 7.42e-25 6.2e-02s 5.9e-02s
ducc0: 64 8.24e-11 2.55e-24 1.0e-03s 1.2e-03s
SHTOOLS: 128 1.99e-10 4.61e-24 1.1e-01s 1.1e-01s
ducc0: 128 4.12e-10 1.31e-23 2.0e-03s 1.9e-03s
SHTOOLS: 256 3.81e-09 2.31e-22 6.1e-01s 5.0e-01s
ducc0: 256 4.16e-09 2.79e-22 1.0e-02s 1.4e-02s
SHTOOLS: 512 5.26e-08 1.16e-20 3.1e-01s 3.0e-01s
ducc0: 512 6.21e-08 1.71e-20 3.7e-02s 2.7e-02s
SHTOOLS: 1024 1.36e-07 1.99e-20 2.1e+00s 2.0e+00s
ducc0: 1024 1.17e-07 2.05e-20 1.1e-01s 6.8e-02s
SHTOOLS: 2048 3.25e-06 2.65e-18 1.3e+01s 1.3e+01s
ducc0: 2048 2.03e-06 1.43e-18 4.2e-01s 3.5e-01s
I'm now supporting conversion of Condon-Shortley phase and normalizations 1, 2, and 4. Not sure whether I can support unnormalized coefficients (norm=3); the conversion factors become extremely crazy.
Concerning the interface, I won't be able to copy the interface of SHRotateRealCoef
, since ducc0
does not need a precomputed Wigner matrix, and we definitely should not compute it if it is not required.
This looks great! (Though I didn't have time to read through all the code). Here are just a few quick comments:
extend
that allows you to optionally compute the redundant 360 longitude, and 90 S. This is built into alot of the routines in SHGrid, so if this isn't done, it will cause problems upstream...I would be happy to help add documentation to the docstrings for the routines that have ducc support. It looks like you didn't modify the SHGrid and SHCoeffs classes, and I could probably help with this as well. Also, I will probably need to write/re-write the implementation webpage describing the different backends and timing tests.
Just let me know what I can do to help.
- It is not too important to treat the unnormalized case, as this only works for very low degrees. I would probably only allow this option when L is less than some predefined value that gives good accuracy. I would need to look through my code, but I seem to recall L=90 was about the highest I could go...
I think I found a way to support this in the meantime. It is pretty slow, but I don't expect that anyone is using this convention for performance-sensitive work, so that should be fine. Limiting this to some fixed L is probably a good precaution.
- For the creation of grids, we will need to take into account the option of sampling in longitude (sampling=1 and 2), as well as the option extend that allows you to optionally compute the redundant 360 longitude, and 90 S. This is built into alot of the routines in SHGrid, so if this isn't done, it will cause problems upstream...
sampling=1/2 is already supported. "extend" is not yet in, but should not be too much work.
- Are there routines for complex data and complex coefficients? If not, we could just add an error if you try to use this backend with these data types. We don't need to implement this immediately.
There is no native support in ducc
for this, but it can be emulated by processing real and imaginary parts separately, I think.
- Is there a routine for computing the gradient? Again, we don't need to implement this immediately.
I think I have something pretty similar in the C++ implementation, but it does not have a Python interface yet. This may have to wait a bit.
- Would it be better to put the wrapper functions in the ducc code base, instead of pyshtools?
I would argue that the wrapper provides potential benefits to all pyshtools users, but only to those ducc users which are also using pyshtools, so I'd argue for inclusion in pyshtools. But there is also the option to make this a completely standalone code base. Concerning the license, I'm fine with releasing the wrapper code under the same license as shtools.
- And lastly, is it possible to install ducc with conda?
As far as I understand, any PyPI-hosted package can be installed via conda (see, e.g., https://stackoverflow.com/questions/29286624/how-to-install-pypi-packages-using-anaconda-conda-command), but I'm definitely not an expert on this.
I would be happy to help add documentation to the docstrings for the routines that have ducc support. It looks like you didn't modify the SHGrid and SHCoeffs classes,
Correct, because I'm not sure where and how exactly you plan to add the checks
if <backend-xyz.available>:
call backend-xyz.func()
else:
call shtools.func()
You will know best where this kind of code should go. Once this is in place, I could fill in my calls; so if you have some time, adding this functionality would be great!
Also, I will probably need to write/re-write the implementation webpage describing the different backends and timing tests.
OK, but this is probably still a bit in the future :-)
Please also note that my current glue code is really ugly and needs a lot of tweaking and formatting before it can be seriously considered for inclusion. For the moment, this is meant as a proof of concept only.
Context
pyshtools makes use of the the python wrapped Fortran 95 SHTOOLS routines. These routines are well tested and relatively fast. However, C-based routines can be faster by a factor of two or so, and optimization of the existing routines likely can only increase efficiency by several 10s of percent. It is unlikely that another C-based package will come along to replace all the Fortran 95 routines, but it would be beneficial to give the option of using faster routines for some basic operations such as basic spherical harmonic transforms and expansions.
Proposal
Methods such as
.expand()
should accept an optional argumentbackend
that will identify the software package used to perform the spherical harmonic transform. The default value, until a more suitable backend is found, will be 'shtools' for the Fortran 95 SHTOOLS archive.Requirements