flintlib / arb

Arb has been merged into FLINT -- use https://github.com/flintlib/flint/ instead
http://arblib.org/
GNU Lesser General Public License v2.1
455 stars 137 forks source link

Feedback regarding rigorous numerical integration in ARB #196

Open duhadler opened 6 years ago

duhadler commented 6 years ago

I just read your blog entry regarding rigorous numerical integration with ARB. This seems to be an amazing piece of additional functionality, and I am looking forward to the forthcoming next release.

Since you were asking for feedback, I would propose as an additional feature a dedicated function using contour integration for computing derivatives, with an interface similar to what is already available in mpmath.

In terms of applications, I am using verified integration mostly for the evaluation of cumulative distribution functions in various statistical applications, so far using the algorithm described below. Many of the integrands are just special cases of a hypergeometric function multiplied with something else. It turns out that often there is not just one optimal algorithm, but best performance is achieved by a mixture of finite sums, infinite sums, continued fractions and numerical integration. A typical example is the incomplete beta function I(x; a, b), where all of the above algorithms with rigorous error bounds have a role, but only numerical integration provides a verified result quickly if a and b are VERY large and x is close to the median of the distribution.

My own experience with using ARB for verified integration is based on the following paper (I assume that you were hinting at this or related algorithms in your blog): Naoya Yamanaka , Tomoaki Okayama, Shin’ichi Oishi, and Takeshi Ogita: A fast verified automatic integration algorithm using double exponential formula. Nonlinear Theory and Its Applications, IEICE, 2010, vol. 1, no. 1, pp. 119-132. https://www.researchgate.net/publication/258421351_A_fast_verified_automatic_integration_algorithm_using_double_exponential_formula

The rationale behind this paper is similar to what you have used for GL, and the algorithm is straightforward to implement, except for the need to determine a rigorous upper bound for K in equation (6). Since the domain phi(D), defined between equations (5) and (6) of the paper, is not of rectangular shape, one needs to compute a rectangular domain, say DRect, which includes phi(D). While there seems to be no explicit formula for calculating DRect, it can be calculated using an iterative procedure, observing that for DRect = 0.0 +/- x+yi, the maximum value for |y|, say, ymax, always occurs at x=0, and the maximum value for |x| always occurs for just one value of y between 0 and ymax (this assumes a=-1, b=1). The rectangular domain DRect corresponds to a complex ball, say z1. The function g is then evaluated in complex ball arithmetic with z1 as argument: z2 = g(z1). The absolute value of z2 is a real ball, say x2. The supremum of x2 is a rigorous upper bound of K (see Remark 1 of the above paper).

A neat property of this algorithm is that the number of nodes and the interval width h are automatically chosen upfront in a way which balances the discretization error and the truncation error, resulting in a "nearly optimal" number of function evaluations. Another nice point is that algebraic singularities of the "beta density" type are automatically covered, taking into account the ratio of alpha and beta for optimal performance.

Contrary to what is claimed in the paper, in my experience the choice of d can have a great impact on n = M + N + 1, the total number of function evaluations, which are required to achieve the desired absolute error. To be able to select a near-optimal value of d resulting in the lowest number of function evaluations, I usually calculate n for a number of values for d, ranging from d = 1.5 to d = 0.1, using pre-computed values for the matching rectangular domains DRect to save time, and choose the value for d which results in the smallest n.

There is also a variant of this algorithms which covers logarithmic singularities: Tomoaki Okayama: Explicit error bound for the tanh rule and the DE formula for integrals with logarithmic singularity. JSIAM Letters Vol.6 (2014) pp.9-11 https://www.jstage.jst.go.jp/article/jsiaml/6/0/6_9/_pdf

Both algorithms should fit nicely into the more elaborate procedure which you have already established. This includes the need to use subintervals and to be careful with branch cuts. So an additional feature request for later releases would be to include these (or similar, DE based) algorithms as possible choices for verified integration.

Finally, there is literature covering the use of DE based algorithms for verified integration over infinite intervals. The challenge is how to calculate K (and I have absolutely no idea how to do this in an automated way). Here is a reference: Okayama, T., Matsuo, T., & Sugihara, M. 2013. Error estimates with explicit constants for Sinc approximation, Sinc quadrature and Sinc indefinite integration. Numerische Mathematik, 124, 361 – 394 https://interval.louisiana.edu/reliable-computing-journal/volume-19/reliable-computing-19-pp-045-065.pdf

Even further down the road is iterated integration: Okayama, Tomoaki. 2016. Explicit Error Bound for Modified Numerical Iterated Integration by Means of Sinc Methods. Pages 202–217 of: Revised Selected Papers of the 6th International Conference on Mathematical Aspects of Computer and Information Sciences - Volume 9582. MACIS 2015. New York, NY, USA: Springer-Verlag New York, Inc https://pdfs.semanticscholar.org/8593/a23d72bfab1c7b3ecbca6597bd14e422c218.pdf

So there is a lot that could be done!

fredrik-johansson commented 6 years ago

For derivatives, an entirely separate implementation based on the trapezoidal rule should probably be used by default. For a function that is analytic in the annulus 1/R < |z-a| < R, the error of the N-point trapezoidal formula on |z-a| = 1 is bounded by 4 pi M / (R^N - 1). I did not check how this rate of convergence compares to the present code, but it should be comparable (if not better) in most cases, and no nodes are needed. An algorithm that wraps acb_calc_integrate could perhaps be optional, for the cases where there is some localized disturbance near the integration path so that splitting the path works better.

I think the user would have to specify the actual radius |z-a| = r to use for the integration path (but R would be determined automatically). Actually, this is definitely needed; we can find a pole-free annulus, but if there is a removable singularity at a (or if there is a pole and we want Laurent coefficients), there is no way to detect automatically whether there are also additional singularities very close to a. So the specification would be to compute

1/(2 pi i) \int_{|z-a|=r} f(z) / (z-a)^(k+1) dz

for given a, r, k. In fact, the interface should be to compute this for k, k+1, ..., k+len-1 for given (k, len), giving the truncated Laurent series at once. Doing this simultaneously will be much more efficient since the function values can be recycled. This is an especially big advantage over acb_calc_integrate; extending acb_calc_integrate to evaluate a vector of integrals efficiently would be much more work.

There should also be some flag to indicate conjugate and even/odd symmetry of the integrand around the point, allowing 1/2 or 1/4 of the evaluation points to be used.