Closed andrewcz closed 4 months ago
Hi @andrewcz, The variance of the time-series is imposed by the Scattering Spectra (the 'variance' coefficients corresponding to the symbol $\Phi_2$), their sum is approximately equal to the variance of the time-series, so imposing them implies you impose the variance. Now, if you want to have a perfect match between observed variance and generated variance, you can manually put the generated data at the correct mean and variance through an affine transform (x -> a x + b), and that would not deteriorate the Scattering Spectra matching.
Thank you, Very generous @RudyMorel . Follow up question can we estimate convergence of the simulation. How long would it take to move to the true distribution.
Hi @andrewcz, Making sure the algorithm converged to a trajectory that was sampled from the maximum entropy model $p_\theta$ is very hard (because of limited data + high-dimension of the Scattering Spectra see the paper). The algorithm rather considers the convergence of the Scattering Spectra estimated on the simulated data to the target Scattering Spectra (estimated on the observed data). On time-series of length 6000 with Scattering Spectra of dimension ~300, you typically need a few hundred gradient steps. Depending on the machine, it can take between ~5s to ~1000s (depends a lot on the machine). The current package does not propose acceleration schemes but there are numerous techniques if one was to generate tens of thousands of trajectories (e.g. generating batch of trajectories).
@RudyMorel your a legend! thank you! is this research your possibly continuing? Happy to close this comment - will try an example over the weekend.
I appreciate your interest in our work. We have a follow-up paper doing conditional generation in Finance paper: https://arxiv.org/abs/2308.01486 github: https://github.com/RudyMorel/shadowing We also used the Scattering Spectra for source separation on seismic time-series paper: https://proceedings.mlr.press/v202/siahkoohi23a/siahkoohi23a.pdf github: https://github.com/alisiahkoohi/srcsep We are currently working on other follow-ups
@RudyMorel I’m a little confused. If I have a series of daily returns and I want to generate multiple paths - what is the best way to do this to estimate the “true” distribution?
The pathshadowing method or the scattering spectra?
The underlying distribution $p(x)$ of the data is not accessible. The Scattering Spectra ("Scale dependencies ..." paper) is really about the generative model: proposing a model $p\theta(x)$ of $p(x)$. The "Path-Shadowing Monte-Carlo" paper tackles conditional generation: we propose a model of $p(x|x\text{past})$ that relies entirely on our model $p_\theta(x)$ of $p(x)$.
If you want to generate multiple paths, the simplest is just to call multiple times the "generate" function. If you don't touch the seed, each call will output different generated data. There are other ways of generating "batch" of data, which are not well handled by the current code.
@RudyMorel thank you much appreciated. So i assume if the goal is just to generate paths - The example in the scattering spectra should be suitable. Does this automatically take into account non stationary and auto correlation of the returns, since it copies the exact path? Best, Andrew
Yes, our generative model is coded here, "scattering_spectra" repo (you'll see that "shadowing" repo relies on "scattering_spectra" repo). The model assumes stationarity of the increments so you need to feed it the log-prices or the log-returns. The Scattering Spectra contain an estimation of the auto-correlation (through the coefficients $\Phi_2$, see the 2 papers). The Scattering Spectra model does not copy the exact data, briefly: here's how it works:
generate
So each time you go through the procedure, a new $x$ is outputted
thank you @RudyMorel have a great weekend!
Hi @RudyMorel , what is the behaviour of the variance of the generated series. Does it vary in size automatically? Cheers, Andrew