Closed milankl closed 2 years ago
Because paywalls are 💩, here is the paper:
@hottad maybe you can clarify, if IFS/ectrans loops until nfreq = floor((nlon - 1)/(2 + coslat^2))) - 1
then towards the equator this is effectively nlon/3, hence quadratic and not cubic. I currently don't understand why you wouldn't want to loop only until nlon/4 at the Equator for a cubic truncation? Visualised this looks like
with IFS being quadratic at the equator but a coslat^2 scaling towards linear near the poles. Could we do something like what I labelled "new", which is also linear near the poles and uses a coslat scaling towards cubic at the equator. That would save another 10% in the Legendre transform compared to IFS. The first few rings in comparison (ring, nlon, ifs, new)
1 20 8 11
2 24 10 13
3 28 12 15
4 32 14 16
5 36 16 18
6 40 18 19
7 44 20 21
8 48 22 22
which is for new
truely linear for ring 1,2,3 and then only drops down slowly
Hi Milan @milankl, Thank you for this plot! I honestly do not understand much about this, but I found something interesting about IFS formula.
In addition to zonal aliasing, there is another factor to consider when we determine the size of loop over m
.
At latitudes outside of the Tropics, the associated Legendre function decays quickly as m
gets larger. Thus, if we define some threshold ϵ below which we can ignore $P_n^m(\sin\rm{lat)}$, we can define m
above which we do not need to loop over when performing direct or inverse Legendre transform.
If we call such m
as m_to_retain(ϵ)
, I found that m_to_retain(ϵ)
is rather insensitive to the exact choice of ϵ, and interestingly, the IFS formula gives a good approximation to m_to_retain(ϵ)
.
This is a plot for O80 grid (nlat_half=80) with Nmax=79 truncation (cubic). As we can see, IFS formula and m_to_retain(ϵ=1e-6)
agree quite well.
Please have a look at this Jupyter notebook https://gist.github.com/hottad/5536882443cb5d018bbd57a88f10eb0e for precise definitions and derivation (and code).
I do not know if this is related to how IFS formula is devised, but seems like it's a too good agreement for being just a coincidence.
In the above post I forgot to mention that this m_to_retain
plot is basically a reproduction of Figure 3 (and other similar plots) of Courtier and Naughton 1994, except that in Courtier and Naughton they used quadratic truncation whereas I plotted for cubic truncation.
I computed up to Tc2559 with the same code. The results below:
IFS formula approximates m_to_retain(ϵ=1e-6)
quite well at lower resolutions, but as the resolution (truncation wavenumber) increases, it starts to deviate from m_to_retain(ϵ=1e-6)
outside the polar areas.
Extrapolating the trend, I guess your "new" formula will be a better alternative than IFS formula at very high resolutions beyond, like Tc3999, but perhaps that's outside the scope of SpeedyWeather? If we use the "new" formula at the resolutions tested here, we will ignore a lot of $P_n^m(\sin\rm{lat})$ that are not small, so we may end up with inexact associate Legendre transform.
Alternatively, do you think it makes sense to compute m_to_retain(ϵ)
for some reasonable ϵ within the constructor of SpectralTransform
type? This may slow down the setup process, but will save the cost of transforms in comparison to IFS formula, especially at a relatively high resolution.
In any case, we should decide which way to go by doing experiments.
@samhatfield @milankl Any thoughts?
Quick comment: I believe the dip in m_to_retain
between 500 and 1700 at T2559 is due to issues with the computation of the Legendre polynomials at very high resolution similar to https://github.com/jmert/AssociatedLegendrePolynomials.jl/issues/27 (which is the package we are currently using to do that).
General comment: I agree that we would need to test these ideas while actually running the model. Daisuke, do you know of a good test case to do that?
In general, I like the m_to_retain
idea, but I'm not sold that this is actually what we want: The reason is that a cubic truncation is a form of filtering, so we actually want a certain error if that means we can scale-selectively filter out some high-frequency waves. I believe all of these methods are idempotent, meaning you may start with a field that contains some waves that shouldn't be representable with a cubic truncation, but once they are filtered out the transform should approach exactness (up to rounding errors). In that sense, I see the m_to_retain
idea as an upper bound of m
beyond which we shouldn't loop, but maybe we do want to shortcut the loop over m even further.
Going forward I suggest we create a parameter like shortcut_legendre::Symbol
with options like :linear, :quadratic, :cubic, :ifs, :lincub_coslat
which implements exactly these formulas and precomputes them in SpectralTransform. In spectral!
we then just load those such that we don't have to play with the spectral transforms just with the precomputation when initiating SpectralTransform.
And I've just created a plot that compares m_to_retain, ifs and lincub_coslat which is the linear to cubic transition via coslat scaling. And I've added the savings in % (total loop iterations over m for all latitude rings) relative to 1:mmax+1
for every ring
Thank you Milan. I agree. m_to_retain
should be interpreted as the upper bound of the size of m
-loop.
What is not entirely clear to me is if we can ensure orthonormality of $Y_n^m(\rm{lon},\rm{lat})$ if we reduce m
beyond m_to_retain
. Perhaps, before trying with model runs, we should start by checking the orthonormality (I mean, if we initialize alm
with (l,m)-element having one and all the others zero, and doing a round-trip of transform by successively calling gridded!()
and then spectral!()
, does that perfectly (within the desired tolerance) restore the original alm
, and does this hold for any pair of (l,m)?). I guess this is equivalent to what you call idempotency?
Going forward I suggest we create a parameter like
shortcut_legendre::Symbol
with options like:linear, :quadratic, :cubic, :ifs, :lincub_coslat
which implements exactly these formulas and precomputes them in SpectralTransform.
Thanks. That's a good idea. Will you be able to implement and test this?
Quick comment: I believe the dip in
m_to_retain
between 500 and 1700 at T2559 is due to issues with the computation of the Legendre polynomials at very high resolution similar to jmert/AssociatedLegendrePolynomials.jl#27 (which is the package we are currently using to do that).
This underflow/overflow problem is a well known issue. Just in case you are not aware, there are two known resolutions to this.
One way to resolve this is to use an exponent-extension method (i.e., represent Pnm with a struct (called X-number) which is a pair of an integer (to save the exponent) and a floating point number (either Float32 or Float64), and override operators like mul(Float,X-number) and add(X-number,X-number) with a bespoke method). Detailed explanation and sample code in Fortran are documented in
Fukushima (2012, Journal of Geodesy) (it's behind paywall but the author's copy on researchgate.net is available from here). It should be easy to implement X-number in Julia as something like Xnumber{T} <: AbstractFloat where T
. Experience from Fortran 90 implementation is that, performance penalty from using X-number representation is quite small.
Alternatively, when you compute Pnm by recurrence formula like Belousov, you can check whether Pnm goes beyond/below some thresholds(like 1e-16 and 1e+16 for Float64) and apply scaling by some factors (e.g., 2^20 and 2^(-20), these can be arbitrary but should be power of 2 to avoid rounding error) if that occurs, and keep track of how many time the scaling was applied for each Pnm. This is what is done for IFS or JMA's global model. Wedi et al. (2013), Section 2, very briefly mentions this strategy in IFS:
Equation (9) is also unstable but can be made stable by tracking the values of the numbers and keeping them within an acceptable range that can be represented with double precision
The first approach Daisuke suggests can already be easily done with AssociatedLegendrePolynomials.jl as it's type-flexible but currently quadmath arithmetics like Float128 aren't fast.
julia> using AssociatedLegendrePolynomials, Quadmath, BenchmarkTools
julia> Λ = zeros(Float32,3001,3001) # always store in a Float32 array
julia> @btime λlm!(Λ, 3000, 3000, Float32(cos(π/4))); # the type of cos(colat) determines the format used for calculation
18.604 ms (5 allocations: 80 bytes)
julia> @btime λlm!(Λ, 3000, 3000, Float64(cos(π/4)));
22.545 ms (5 allocations: 80 bytes)
julia> @btime λlm!(Λ, 3000, 3000, Float128(cos(π/4)));
6.238 s (6 allocations: 192 bytes)
Although I have quite some expertise in writing new number formats, I'm not particularly keen to implement an X-number myself. On the other hand, how hard can it be? 😆 Good thing in Julia is, you don't have to overload the arithmetics.
The 2nd approach would require some tweaks to AssociatedLegendrePolynomials, @jmert you may find this interesting.
This is an issue to collect ideas around the linear vs quadratic vs cubic truncation when looping over the zonal wavenumbers $m$. With #127 we'll do
with
nfreq = nlon÷2 + 1
andnlon
the number of longitude points at a given latitude ring. So for full grids, the loop goes over allm
but for reduced grids this can be shortened. IFS seems to be doing the following (j
nlon
nfreq
) at Tco79So that's about 3 fewer m around the poles and less than half at the Equator
and apparently follows the formula
nfreq = floor((nlon - 1)/(2 + coslat^2))) - 1
. At that resolution this is per hemisphere 4784 loops over m compared to 6400 total (80 rings each 80 orders). Meaning there's a good amount of performance gain possible if we follow something similar. What needs to be checked though is how much this applies to other grids and whether we can directly formulate this as a function of the truncation order.More information on this apparently in Courtier and Naughton 1994
@hottad @samhatfield