Euler_maruyama function

psotom commented 7 months ago

References to issues or other PRs

Describe the proposed changes

Included a function that makes numerical integration of an Îto SDE using the euler_maruyama shceme. It has been included in the datasets module to extend the functionality dedicated to generate synthetic data.

Additional information

Checklist before requesting a review

[x] I have performed a self-review of my code
[ ] The code conforms to the style used in this package
[x] The code is fully documented and typed (type-checked with Mypy)
[x] I have added thorough tests for the new/changed functionality

vnmabus commented 5 months ago

So, for me the only pending issues for accepting this PR are the parameters diffusion_matricial_term and dim_noise:

For diffusion_matricial_term, I do not get what does that mean in the model. Please, explain it to me (with equations, if possible) to see if it is necessary.
From the equation it seems to me that the parameter dim_noise can be computed from the diffusion, and should not be passed by the user. Again, if that is not correct, please explain.

psotom commented 5 months ago

So, for me the only pending issues for accepting this PR are the parameters diffusion_matricial_term and dim_noise:

For diffusion_matricial_term, I do not get what does that mean in the model. Please, explain it to me (with equations, if possible) to see if it is necessary.

From the equation it seems to me that the parameter dim_noise can be computed from the diffusion, and should not be passed by the user. Again, if that is not correct, please explain.

To answer your question, I will explain some context:

The formula of the SDE we want to model is the following:

$$d\mathbf{X}(t) = \mathbf{F}(t, \mathbf{X}(t))dt + \mathbf{G}(t, \mathbf{X}(t))d\mathbf{W}(t).$$

In this equation,

$\mathbf{X} = (X^{(1)}, X^{(2)}, ... , X^{(q)})\in \mathbb{R}^q$ is a vector that represents the stochastic process whose values we wish to simulate. The superindeces indicate the coordinate of the vector.
- The function $\mathbf{F}: \mathbb{R} \times \mathbb{R}^q \rightarrow \mathbb{R}^q$ defines the drift term, which is the deterministic component of the equation.
- The function $\mathbf{G}: \mathbb{R} \times \mathbb{R}^q \rightarrow \mathbb{R}^q \times \mathbb{R}^m$ defines the diffusion term and refers to the stochastic component of the evolution.
$\mathbf{W}(t) \in \mathbb{R}^m$ refers to a Wiener process (Standard Brownian motion) of dimension $m$. The components of this process are independent.
$q$ refers to the dimension of the variable $\mathbf{X}$ (dimension of the codomain).
$m$ is the dimension of the noise.

diffusion_matricial_term justification:

The main reason for the use of this parameter is to add flexibility to the input of the SDE arguments.

As explained above the diffusion term is a matrix. However, many times is independent on $X$ and we can consider it as a scalar. We have two options:

Treat everything as a matrix. For the 1-dimensional case an input such as diffusion=1 is not valid and diffusion=[[1]] would be needed. Also, if $\mathbf{X}$ is multidimensional, if you want to add a constant diffusion, it must given by a multiple of the identity matrix with the corresponding dimensions.
Distinguish between scalar and matrix diffusion terms. In this case, we can define for example diffusion=1 and it will work no matter what are the dimensions of the process. In general, this approach facilitates a lot the way the diffusion can be defined. For this distinction we cannot only rely on the diffusion function, but we need the diffusion_matricial_term parameter.

Additionally, if we choose option 2 we can add "for free" the possibility of the diffusion term being a vector. This is specially useful when we want to simulate a process in each of the coordinates.

For example, if I want to simulate a Geometric Brownian motion ($dX = \mu X dt + \sigma X dW_t$) in each coordinate, the definition of the diffusion for both options would be:

def gbm_option1_diffusion(t: float, x: NDArrayFloat) -> NDArrayFloat:
    """I must define a multiple of the identity matrix for this case."""
    diffusion = np.zeros((n_samples, dim_codomain, dim_codomain))
    for i in np.arange(dim_codomain):
        diffusion[:, i, i] = sigma * x[:, i]
    return diffusion

def gbm_option2_diffusion(t: float, x: NDArrayFloat) -> NDArrayFloat:
    return sigma * x

Wrapping everything up, I think option 2 (which includes diffusion_matricial_term) is better because it increases a lot the flexibility in the input of the function.

dim_noise:

After close examination I think I have found a way to compute the dimension of the noise without needing the parameter. I will make a commit.

vnmabus commented 5 months ago

But what I do not get is why it is not possible to just detect the appropriate version depending on the diffusion parameter passed, instead of forcing the user to pass an explicit boolean parameter. Is there any situation that could lead to ambiguity?

psotom commented 5 months ago

But what I do not get is why it is not possible to just detect the appropriate version depending on the diffusion parameter passed, instead of forcing the user to pass an explicit boolean parameter. Is there any situation that could lead to ambiguity?

The only way I have found to differentiate the different cases using only the diffusion parameter is by looking at the shape of the return of the diffusion. However, there are different cases which potentially have the same shape. For example, a scalar diffusion which depends on the value of $X$ will return a 1-dimensional vector. A diffusion which does not depend on $X$ and is given by a vector will also return a 1-dimensional vector. If n_samples = dim_codomain, then the shape of this array is the same and there is no way to distinguish them. That is why we need the extra parameter.

vnmabus commented 5 months ago

I think I see the problem better now. I will accept this as is, and lets see if it is possible to remove it in the future.

vnmabus commented 3 months ago

@all-contributors Please add @psotom for code, example, ideas, test, research and documentation.

allcontributors[bot] commented 3 months ago

@vnmabus

I've put up a pull request to add @psotom! :tada:

GAA-UAM / scikit-fda