Increase variance of the generated series

andrewcz commented 4 months ago

Hi @RudyMorel , what is the behaviour of the variance of the generated series. Does it vary in size automatically? Cheers, Andrew

RudyMorel commented 4 months ago

Hi @andrewcz, The variance of the time-series is imposed by the Scattering Spectra (the 'variance' coefficients corresponding to the symbol $\Phi_2$), their sum is approximately equal to the variance of the time-series, so imposing them implies you impose the variance. Now, if you want to have a perfect match between observed variance and generated variance, you can manually put the generated data at the correct mean and variance through an affine transform (x -> a x + b), and that would not deteriorate the Scattering Spectra matching.

andrewcz commented 4 months ago

Thank you, Very generous @RudyMorel . Follow up question can we estimate convergence of the simulation. How long would it take to move to the true distribution.

RudyMorel commented 4 months ago

Hi @andrewcz, Making sure the algorithm converged to a trajectory that was sampled from the maximum entropy model $p_\theta$ is very hard (because of limited data + high-dimension of the Scattering Spectra see the paper). The algorithm rather considers the convergence of the Scattering Spectra estimated on the simulated data to the target Scattering Spectra (estimated on the observed data). On time-series of length 6000 with Scattering Spectra of dimension ~300, you typically need a few hundred gradient steps. Depending on the machine, it can take between ~5s to ~1000s (depends a lot on the machine). The current package does not propose acceleration schemes but there are numerous techniques if one was to generate tens of thousands of trajectories (e.g. generating batch of trajectories).

andrewcz commented 4 months ago

@RudyMorel your a legend! thank you! is this research your possibly continuing? Happy to close this comment - will try an example over the weekend.

RudyMorel commented 4 months ago

I appreciate your interest in our work. We have a follow-up paper doing conditional generation in Finance paper: https://arxiv.org/abs/2308.01486 github: https://github.com/RudyMorel/shadowing We also used the Scattering Spectra for source separation on seismic time-series paper: https://proceedings.mlr.press/v202/siahkoohi23a/siahkoohi23a.pdf github: https://github.com/alisiahkoohi/srcsep We are currently working on other follow-ups

andrewcz commented 4 months ago

@RudyMorel I’m a little confused. If I have a series of daily returns and I want to generate multiple paths - what is the best way to do this to estimate the “true” distribution?

andrewcz commented 4 months ago

The pathshadowing method or the scattering spectra?

RudyMorel commented 4 months ago

The underlying distribution $p(x)$ of the data is not accessible. The Scattering Spectra ("Scale dependencies ..." paper) is really about the generative model: proposing a model $p\theta(x)$ of $p(x)$. The "Path-Shadowing Monte-Carlo" paper tackles conditional generation: we propose a model of $p(x|x\text{past})$ that relies entirely on our model $p_\theta(x)$ of $p(x)$.

If you want to generate multiple paths, the simplest is just to call multiple times the "generate" function. If you don't touch the seed, each call will output different generated data. There are other ways of generating "batch" of data, which are not well handled by the current code.

andrewcz commented 4 months ago

@RudyMorel thank you much appreciated. So i assume if the goal is just to generate paths - The example in the scattering spectra should be suitable. Does this automatically take into account non stationary and auto correlation of the returns, since it copies the exact path? Best, Andrew

RudyMorel commented 4 months ago

Yes, our generative model is coded here, "scattering_spectra" repo (you'll see that "shadowing" repo relies on "scattering_spectra" repo). The model assumes stationarity of the increments so you need to feed it the log-prices or the log-returns. The Scattering Spectra contain an estimation of the auto-correlation (through the coefficients $\Phi_2$, see the 2 papers). The Scattering Spectra model does not copy the exact data, briefly: here's how it works:

you feed a time-series data $\tilde{x}$ to the function generate
an estimate of the Scattering Spectra $\Phi(\tilde{x})$ is computed on $\tilde{x}$
from an original white-noise $x$ (or Brownian path), perform a gradient descent until $|\Phi(x) - \Phi(\tilde{x})|_2\leq\epsilon$ is satisfied
return $x$

So each time you go through the procedure, a new $x$ is outputted

andrewcz commented 4 months ago

thank you @RudyMorel have a great weekend!

RudyMorel / scattering_spectra

Increase variance of the generated series #14