GAA-UAM / scikit-fda

Functional Data Analysis Python package
https://fda.readthedocs.io
BSD 3-Clause "New" or "Revised" License
308 stars 58 forks source link

Integral of discretized functional data #619

Open pcuestas opened 4 months ago

pcuestas commented 4 months ago

This issue addresses the definition of the FData.integrate() method for both FDataGrid and FDataIrregular objects; in particular, how this integral should be defined for any discretized functional observation.

To simplify, I will speak about $[a, b] \to \mathbb{R}$ functional observations.

(This issue is related to #609 )


Given a discretized functional observation x (an element of an FDataGrid or an FDataIrregular object):

Currently, the integral of such an observation x is computed with the function scipy.integrate.simpson called like this:

scipy.integrate.simpson(x(points), points)

meaning that the integral is computed over the interval $[t0, t{M-1}]$. That is, we are computing

$$\int_{[t0, t{M-1}]} x(t)\ dt,$$

when we should be computing

$$ \int{[a, b]} x(t)\ dt = \int{[a, t0]} x(t)\ dt + \int{[t0, t{M-1}]} x(t)\ dt + \int{[t{M-1}, b]} x(t)\ dt. $$

So, the problem arises:

How to calculate $\int_{[a, t0]} x(t)\ dt$ and $\int{[t_{M-1}, b]} x(t)\ dt$?

There are formulas for numeric integration when the ends of the integral are open (i.e. when either $a \neq t0$ or $t{M-1} \neq b)$. However, we have only found them for equally spaced grids, which is not generally our case.

Some weeks ago, I talked briefly about how we should solve this with @vnmabus and Alberto Suarez. A strong conclusion was not reached, although Alberto argued that a very simple solution is as good as any, and we did not find anything wrong with that reasoning. In that spirit, I'll write down the two options that were proposed and open the discussion of which shall be chosen (among these or other possible solutions).

Option 1: fill the ends with a constant value

Approximate the integrals as follows:

$$\int_{[a, t0]} x(t)\ dt :\approx \int{[a, t_0]} x(t_0)\ dt = x(t_0) (t_0 - a),$$

$$\int_{[a, t0]} x(t)\ dt :\approx \int{[t{M-1}, b]} x(t{M-1})\ dt = x(t{M-1}) (b - t{M-1}).$$

Option 2: redefine the ends of the domain and integrate over the "augmented grid"

Redefine the grid to points_augmented = { $a, t0, \dots, t{M-1}, b$ } and $x(a) := x(t0)$, $x(b) := x(t{M-1})$ and compute the integral of x over $[a, b]$ as

scipy.integrate.simpson(x(points_augmented), points_augmented)