Closed mreineck closed 3 years ago
Demo output of ducctest.py
on a single core of an AMD Ryzen 7 4800H CPU.
martin@marvin:~/codes/ssht$ python3 src/pyssht/ducctest.py
flm rotation tests:
L= 32: L2 error=7.787688e-15, speedup factor=2.350531
L= 64: L2 error=1.828852e-14, speedup factor=5.006331
L= 128: L2 error=4.888588e-14, speedup factor=9.991688
L= 256: L2 error=1.428483e-13, speedup factor=17.211348
L= 512: L2 error=4.099982e-13, speedup factor=26.752970
L=1024: L2 error=1.119593e-12, speedup factor=33.509578
SHT tests:
MW , L= 32, Reality=False, Spin=0: L2 error=1.069220e-14, speedup factor=45.675511
MW , L= 32, Reality=False, Spin=1: L2 error=1.131743e-14, speedup factor=0.972574
MW , L= 32, Reality=True , Spin=0: L2 error=1.105658e-14, speedup factor=86.940989
MW , L= 64, Reality=False, Spin=0: L2 error=1.764786e-14, speedup factor=11.038193
MW , L= 64, Reality=False, Spin=1: L2 error=1.926463e-14, speedup factor=1.380736
MW , L= 64, Reality=True , Spin=0: L2 error=1.639798e-14, speedup factor=59.431823
MW , L= 128, Reality=False, Spin=0: L2 error=5.021204e-14, speedup factor=61.757347
MW , L= 128, Reality=False, Spin=1: L2 error=6.262586e-14, speedup factor=4.728689
MW , L= 128, Reality=True , Spin=0: L2 error=4.317444e-14, speedup factor=14.079249
MW , L= 256, Reality=False, Spin=0: L2 error=1.011778e-13, speedup factor=18.465855
MW , L= 256, Reality=False, Spin=1: L2 error=1.236980e-13, speedup factor=5.040028
MW , L= 256, Reality=True , Spin=0: L2 error=1.058034e-13, speedup factor=19.899911
MW , L= 512, Reality=False, Spin=0: L2 error=3.113635e-13, speedup factor=18.181369
MW , L= 512, Reality=False, Spin=1: L2 error=2.290599e-13, speedup factor=11.047243
MW , L= 512, Reality=True , Spin=0: L2 error=2.281300e-13, speedup factor=40.174040
MW , L=1024, Reality=False, Spin=0: L2 error=5.232240e-13, speedup factor=21.449102
MW , L=1024, Reality=False, Spin=1: L2 error=5.493335e-13, speedup factor=14.823629
MW , L=1024, Reality=True , Spin=0: L2 error=4.972997e-13, speedup factor=40.481276
MW , L=2048, Reality=False, Spin=0: L2 error=3.060674e-12, speedup factor=87.109723
MW , L=2048, Reality=False, Spin=1: L2 error=1.876659e-12, speedup factor=36.758097
MW , L=2048, Reality=True , Spin=0: L2 error=2.114590e-12, speedup factor=47.269750
MWSS, L= 32, Reality=False, Spin=0: L2 error=7.584225e-15, speedup factor=52.670301
MWSS, L= 32, Reality=False, Spin=1: L2 error=8.188209e-15, speedup factor=1.330590
MWSS, L= 32, Reality=True , Spin=0: L2 error=7.136339e-15, speedup factor=45.936444
MWSS, L= 64, Reality=False, Spin=0: L2 error=1.826959e-14, speedup factor=58.559479
MWSS, L= 64, Reality=False, Spin=1: L2 error=2.488170e-14, speedup factor=3.621226
MWSS, L= 64, Reality=True , Spin=0: L2 error=1.595614e-14, speedup factor=100.420469
MWSS, L= 128, Reality=False, Spin=0: L2 error=3.753735e-14, speedup factor=8.173109
MWSS, L= 128, Reality=False, Spin=1: L2 error=3.884384e-14, speedup factor=6.621827
MWSS, L= 128, Reality=True , Spin=0: L2 error=3.491795e-14, speedup factor=55.736038
MWSS, L= 256, Reality=False, Spin=0: L2 error=7.448911e-14, speedup factor=16.896459
MWSS, L= 256, Reality=False, Spin=1: L2 error=9.002071e-14, speedup factor=9.921884
MWSS, L= 256, Reality=True , Spin=0: L2 error=7.279597e-14, speedup factor=37.560571
MWSS, L= 512, Reality=False, Spin=0: L2 error=1.774534e-13, speedup factor=23.261393
MWSS, L= 512, Reality=False, Spin=1: L2 error=2.403929e-13, speedup factor=17.595352
MWSS, L= 512, Reality=True , Spin=0: L2 error=1.829328e-13, speedup factor=40.459189
MWSS, L=1024, Reality=False, Spin=0: L2 error=5.854770e-13, speedup factor=36.144677
MWSS, L=1024, Reality=False, Spin=1: L2 error=4.506434e-13, speedup factor=24.387793
MWSS, L=1024, Reality=True , Spin=0: L2 error=4.687843e-13, speedup factor=57.871766
MWSS, L=2048, Reality=False, Spin=0: L2 error=2.298029e-12, speedup factor=71.017774
MWSS, L=2048, Reality=False, Spin=1: L2 error=1.127773e-12, speedup factor=79.374561
MWSS, L=2048, Reality=True , Spin=0: L2 error=1.792836e-12, speedup factor=224.463276
GL , L= 32, Reality=False, Spin=0: L2 error=1.840703e-11, speedup factor=29.883576
GL , L= 32, Reality=False, Spin=1: L2 error=1.883529e-11, speedup factor=1.666013
GL , L= 32, Reality=True , Spin=0: L2 error=1.862580e-11, speedup factor=47.404505
GL , L= 64, Reality=False, Spin=0: L2 error=2.006364e-11, speedup factor=25.033709
GL , L= 64, Reality=False, Spin=1: L2 error=2.003057e-11, speedup factor=1.718936
GL , L= 64, Reality=True , Spin=0: L2 error=2.057525e-11, speedup factor=2.316247
GL , L= 128, Reality=False, Spin=0: L2 error=4.496383e-11, speedup factor=12.278235
GL , L= 128, Reality=False, Spin=1: L2 error=4.434178e-11, speedup factor=9.359636
GL , L= 128, Reality=True , Spin=0: L2 error=4.500517e-11, speedup factor=20.092019
GL , L= 256, Reality=False, Spin=0: L2 error=9.362785e-11, speedup factor=21.250935
GL , L= 256, Reality=False, Spin=1: L2 error=9.367738e-11, speedup factor=10.492432
GL , L= 256, Reality=True , Spin=0: L2 error=9.360139e-11, speedup factor=14.862463
GL , L= 512, Reality=False, Spin=0: L2 error=1.449155e-10, speedup factor=29.151748
GL , L= 512, Reality=False, Spin=1: L2 error=1.447672e-10, speedup factor=20.533196
GL , L= 512, Reality=True , Spin=0: L2 error=1.450029e-10, speedup factor=35.325782
GL , L=1024, Reality=False, Spin=0: L2 error=1.882280e-10, speedup factor=38.456142
GL , L=1024, Reality=False, Spin=1: L2 error=1.881660e-10, speedup factor=26.067922
GL , L=1024, Reality=True , Spin=0: L2 error=1.881754e-10, speedup factor=42.859991
GL , L=2048, Reality=False, Spin=0: L2 error=1.312866e-02, speedup factor=51.301227
GL , L=2048, Reality=False, Spin=1: L2 error=1.307702e-02, speedup factor=32.884071
GL , L=2048, Reality=True , Spin=0: L2 error=1.296155e-02, speedup factor=58.415200
DH , L= 32, Reality=False, Spin=0: L2 error=1.761299e-11, speedup factor=2.191109
DH , L= 32, Reality=False, Spin=1: L2 error=1.834118e-11, speedup factor=2.579484
DH , L= 32, Reality=True , Spin=0: L2 error=1.776948e-11, speedup factor=2.405983
DH , L= 64, Reality=False, Spin=0: L2 error=2.001454e-11, speedup factor=1.932899
DH , L= 64, Reality=False, Spin=1: L2 error=2.011957e-11, speedup factor=1.856050
DH , L= 64, Reality=True , Spin=0: L2 error=1.992613e-11, speedup factor=2.354581
DH , L= 128, Reality=False, Spin=0: L2 error=4.485695e-11, speedup factor=10.512614
DH , L= 128, Reality=False, Spin=1: L2 error=4.478997e-11, speedup factor=10.532030
DH , L= 128, Reality=True , Spin=0: L2 error=4.466421e-11, speedup factor=15.247348
DH , L= 256, Reality=False, Spin=0: L2 error=9.337148e-11, speedup factor=12.130759
DH , L= 256, Reality=False, Spin=1: L2 error=9.362291e-11, speedup factor=10.544331
DH , L= 256, Reality=True , Spin=0: L2 error=9.357068e-11, speedup factor=16.058156
DH , L= 512, Reality=False, Spin=0: L2 error=1.449665e-10, speedup factor=26.412729
DH , L= 512, Reality=False, Spin=1: L2 error=1.447460e-10, speedup factor=20.015510
DH , L= 512, Reality=True , Spin=0: L2 error=1.447082e-10, speedup factor=30.894329
DH , L=1024, Reality=False, Spin=0: L2 error=1.882401e-10, speedup factor=39.573818
DH , L=1024, Reality=False, Spin=1: L2 error=1.881496e-10, speedup factor=27.040965
DH , L=1024, Reality=True , Spin=0: L2 error=1.882652e-10, speedup factor=46.202585
DH , L=2048, Reality=False, Spin=0: L2 error=1.318579e-02, speedup factor=55.183716
DH , L=2048, Reality=False, Spin=1: L2 error=1.276778e-02, speedup factor=34.222753
DH , L=2048, Reality=True , Spin=0: L2 error=1.289799e-02, speedup factor=61.578388
Thanks @mreineck, very interesting. We'll take a look at this in more detail and follow up with any questoins shortly.
@CosmoMatt Can you take a look into this?
@mreineck, I've made a few changes to make the PR a bit more inline with the rest of the code. Mostly, it comes down to:
Do let me know if I've bungled anything.
Hi @mdavezac, I'm perfectly happy with all the changes, perhaps with a small question mark for
get rid of the globals and the select_backend functions in favor of an extra keyword. I have a personal aversions to globals and state
I totally understand (and share!) your aversion against global state in a library. But on the other hand requiring the explicit setting of a key word in every transform call puts quite a heavy burden on the users (especially those updating an already existing code) , so in that particular case I would personally have erred on the side of convenience :-) Still it's of course entirely your call.
Would you be interested in the adjoint transforms as well? I should be able to produce at least the adjoint synthesis fairly quickly; adjoint analysis might take a bit longer.
Thanks for your comments @mreineck. I'll leave it to @mdavezac to comment on the issues of globals vs convenience.
On the adjoints side, it would indeed be very useful to support that also. Great if you're able to add that!
I added inverse_adjoint
, which should be the one that is used much more frequently.
I also tweaked the documentation a little bit, so that users don't accidentally precompute the dl_array
when rotating spherical harmonics with ducc
; that would be quite a waste :-)
One remaining question: (how) do we advertise the nthreads
parameter? It could be added to the documentation of the respective functions, but I don't know your preferences. Overall I would recommend to mention this parameter, since the scaling is quite good, even though it's not really linear.
Sorry if I messed up the code formatting again; if you have an auto-formatting command and let me know, I'm happy to use it.
We are using black for formatting. Otherwise, sounds good, I'll take a look ASAP.
Not sure if the CI failure is related to my changes ... at first glance it seems to be caused by added checks in test_pyssht.py
but not actually originate in the new backend.
@mreineck, I've added something in the docs about nthreads and cleaned up the code some more. I think the failing test was just some numerical noise flakiness. As for globals, I'd rather inconvenience users than risk annoying mistakes because of internal state in ssht 😉. But I do get your point!
I'll ask for any other input from the team, but otherwise I think we are good to go.
Perfect, thanks a lot for the polishing work! Everything's fine from my side.
@mreineck @mdavezac Thanks both for your efforts on this!
I just wanted to check what the situation is re testing.
Do we have unit tests that compare the ducc0
harmonic coefficients with those computed by ssht
? Or are we just comparing consistency of each approach?
Similarly for the inverse_adjoint
, how is this tested?
I haven't had a chance to look at the new code so apologies for these simple questions but thought it would be quickest to just ask you guys the situation.
If I read the tests correctly, they directly compare the results of transforms done via ssht
and ducc0
(including inverse_adjoint
), so that should be safe.
@jasonmcewen, yes we are testing ducc0 vs ssht results for the forward, inverse, and adjoint_inverse transforms as well as the rotations in harmonic space. Comparisons are to a relative tolerance of 1e-6 and an absolute tolerance of 1e-12. We check MW, MWSS, DH, GL, spins 0,1,2, real and complex variations.
@mdavezac @mreineck Great! Shall we merge?
Ready when you are!
One thing that might be useful to mention is that the ducc
SHTs are accurate way beyond the limits imposed by some of the ssht
transforms (L \approx 1800 for GL and DH, I think), but I didn't find this in he documentation.
@mreineck, Thanks for everything. We've made a new release. It should come up on pypi soon.
For completeness, would it be difficult to add the forward adjoint as well?
For completeness, would it be difficult to add the forward adjoint as well?
Not really difficult, but some decent amount of work, I expect. For "classic" grids this would just be "run a synthesis and then apply the quadrature weights", but for MW and MWSS it is a bit trickier: I'd have to "transpose" my complete analysis machinery, which for these kinds of grid contains a lot of tweaks, shortcuts etc. If this has a practical use somewhere I can try and implement this, but so far I have not encountered one.
Thanks for your comments @mreineck.
There are practical applications of the forward adjoints since we're using them in solving convex optimization problems on the sphere. For example, see this paper and this paper. We're applying these techniques for a variety of applications, for example for mass mapping (see here) and also some upcoming works on seismic imaging. I'm very interested to see if we can use ducc0
for these problems since they become quite computationally demanding (particularly when going to the ball).
But I appreciate there is quite a lot of work involved to support this. Perhaps we can add it to the wish list for someday? And when these other projects progress further perhaps we could also get a student or postdoc involved to help with the implementation?
Thanks for the pointers, @jasonmcewen! So far I have only ever seen optimization schemes using the inverse/adjoint_inverse pair. I'll have a go at the implementation over the next few weeks and keep you updated!
That would be amazing if you were able to look into these implementations at some point over the coming weeks @mreineck! Let us know if we can help at all.
And thanks for all your efforts updating ssht
to support ducc0
here too. This will make it very easy for us to pick up ducc0
now for various applications. We hope to give this a try soon and will let you know how it goes.
OK, the easy part of this seems to be really easy: I think I have a properly working code for Gauss-Legendre grids and Driscoll-Healy grids (unfortunately "my" DH grids seems to be different from "yours"; what ssht
defines as "DH" is "Fejer's second rule" in ducc
terminology).
So I should be able to provide a prototype very quickly, but ssht
will only be able to use it for Gauss-Legendre grids initially. Do you see any benefit in having this particular functionality early, e.g. for testing purposes?
And thanks for all your efforts updating ssht to support ducc0 here too. This will make it very easy for us to pick up ducc0 now for various applications. We hope to give this a try soon and will let you know how it goes.
I'm very much looking forward to hear your experiences with this! In principle, if you work at high band limits and make use of multithreading, you should see speedups way beyond a factor 100.
Great you've managed to start taking a look at the forward adjoint already! Tbh, the DH and GL cases were just included initially for some simple comparisions but we've never really used those. I don't think ssht
even implements adjoints for those cases. So I don't think we're really be able to do much testing there. The cases of most interest are the MW and MWSS sampling.
I think I discovered an interesting problem ...
While the inverse
and inverse_adjoint
operators are pretty unambiguous and demonstrably do the same thing in ssht
and ducc0
, the same is not generally true for forward
and forward_adjoint
. As a concrete example, it is not really specified how the forward
operator sould behave when it encounters a map that does not have constant values at the polar rings (assuming the spin is zero). And I have the impression that our implementations differ in that respect. So far the unit tests don't show this because the forward
operator only encounters sane maps there, i.e. ones with a uniform value in the polar rings.
Any suggestions how to break this degeneracy?
One option might be to design forward_adjoint
in such a way that it only ever generates uniform values on polar rings (for spin=0, higher moments are OK for higher spins), and then derive the exact structure of forward
from forward_adjoint
. As far as I can tell, the ssht
incarnation of forward_adjoint
currently doesn't have this property.
Thanks @mreineck. I'm not sure I completely follow. For a spin=0
signal the ring of pixels at the pole(s) should contain identical values. It's only for spin<>0
signals that the values on the polar ring(s) would differ (due to the spin rotation factor). So I'm not sure it's meaningful to consider maps that for have different values on the polar ring(s). It seems like that might be what you're suggesting in your latest message?
For a spin=0 signal the ring of pixels at the pole(s) should contain identical values.
If the map is band limited, that will indeed be the case. But even if you run ssht
's forward_adjoint
on a random set of spherical harmonic coefficients, you normally end up with some variation in these pixels. For an example, try:
import pyssht as ssht
import numpy as np
def complex_coeffs(order, rng):
coeffs = rng.random((order * order, 2), dtype="float64")
return coeffs @ np.array([1, 1j])
def real_coeffs(order, complex_coeffs):
pos_index = [ssht.elm2ind(el, m) for el in range(order) for m in range(1, el + 1)]
neg_index = [ssht.elm2ind(el, -m) for el in range(order) for m in range(1, el + 1)]
complex_coeffs[neg_index] = np.conjugate(complex_coeffs[pos_index])
return complex_coeffs
L=5
coeffs=real_coeffs(L,complex_coeffs(L, np.random.default_rng()))
image = ssht.forward_adjoint(coeffs, L, Reality=True, Method="MWSS", Spin=0)
import matplotlib.pyplot as plt
plt.imshow(image)
plt.show()
While the variation in the polar rings is smaller than elsewhere, it is far from being numerical noise.
ducc0
's forward_adjoint
operator produces a similar, but not identical image which also has variations on the polar rings.
What I'm trying to say is: if the forward
operator is only defined by the requirement that it should accurately recover the input spherical harmonic coefficients from band limited maps, then this definition is not unique. And if ssht
and ducc0
choose two different forward
operators (which both fulfill the requirement), then the adjoint counterparts will also be different, and running both on the same set of flm
will produce different results.
So, what should we compare?
Thanks for the clarification @mreineck. Ok, I'm with you now.
Perhaps we can therefore avoid comparing ssht
and ducc0
and therefore just check each is self-consistent with the dot product test?
Perhaps we can therefore avoid comparing ssht and ducc0 and therefore just check each is self-consistent with the dot product test?
Absolutely! This is what I currently do internally, and I'll also enhance the ducc0
tests in ssht
accordingly.
Unfortunately even with the dot product test I'm not successful when testing forward
and forward_adjoint
directly against each other; and the same seems to be true for ssht
, since you take a detour and actually test inverse
against forward_adjoint
: https://github.com/astro-informatics/ssht/blob/126e26ee026c7f6871828a3d2cd6d5836626f446/tests/test_pyssht.py#L87-L96.
My linear algebra is too rusty to understand why this is necesary ... the inverse/inverse_adjoint
pair doesn't require this sort of complication: https://github.com/astro-informatics/ssht/blob/126e26ee026c7f6871828a3d2cd6d5836626f446/tests/test_pyssht.py#L73-L82
Here it is possible to draw a completely random map (which is certainly not band limited and won't have uniform values at the poles), and the test still works fine.
All of these operations should really consider bandlimited signals. So it makes sense that the forward adjoint test makes a bandlimited test signal f
for test purposes. You could take this f
and use ssht.forward
to compute an equivalent flm
but that will be the same as the one at hand and so is not necessary.
As you point out, a bandlimited test signal is not constructed for f_prime
for the inverse adjoint test. If that were done, the test would obviously still pass. I think the reason the invese adjoint test still works in this setting is because the inverse adjoint annihilates the non-bandlimited component. Consider an inverse transform that can operate with signals at a higher bandlimit that considered simply by ignoring the higher ell
terms. Then the adjoint will simply fill high ell
terms with zeros, effectively annihilating any signal content above the bandlimit of interest. Effectively this follows since application of the inverse transform (defined at L) to a non-bandlimited signal will bandlimit the signal. That is not the case for the forward transform. I'd need to give this some further thought to specify this more precisely but I think that is essentially where the difference is coming from.
I still think that (in principle!) the dot product test should work for any input on both sides, and that's simply because both forward
and forward_adjoint
can (again very much in principle ;-) be written as matrices ... and if they are adjoint to each other, the test will succeed, no matter what the input is (ignoring numerical inaccuracies for simplicity's sake).
Perhaps the crucial point is that the forward
operation for MW and MWSS does not in fact act on a "normal" spherical map but rather on its extension to the Double Fourier Sphere,and that we need to take this into account in some way.
A similar problem happened to me when I first implemented the dot product test for the inverse/inverse_adjoint
pair: since ducc0
only stores flm
with nonnegative m
when working on real-valued maps, I never managed to get the test to work until I took the redundant coefficients with negative m
into account.
Anyway ... this is something I will investigate on the side over the next months or so. I'll let you know when I have new insights!
Hmm, yes that is a good point. I will need to give this some further thought.
This is the result of a small exercise I made to verify the flexibility of my new Python SHT interface in the
ducc0
package.The pull request is analogous to https://github.com/SHTOOLS/SHTOOLS/pull/298 (prompted by https://github.com/SHTOOLS/SHTOOLS/issues/258) and adds an optional back-end for carrying out SHTs and flm rotations, which improves performance in most situations. If this is functionality you'd like to see in
pyssht
, please let me know, otherwise feel free to close the PR.Currently implemented:
pyssht.forward
andpyssht.inverse
for all pixelizations exceptMW_pole
. The adjoint transforms andMW_pole
support can be added easily if you like the PR.pyssht.rotate_flms
src/pyssht/ducctest.py
)Beneficial features of the
ducc0
algorithms:L
and much lower memory overhead forrotate_flms
.The code can be tested by installing this branch, doing a
pip install ducc0
and finallypython ducctest.py
.Compared to my earlier PR for intergrating
libsharp
support (https://github.com/astro-informatics/ssht-archive/pull/5) this PR is much less invasive and has no impact at all if theducc0
package is not present on the user machine.Please let me know what you think!