Closed lacerbi closed 1 year ago
I think the best example we can have are those similar to lotka-volterra, we can imagine some variations (for example with more than a single species, or with 'unknown' species). For a real life dataset, there is the CIGALE (python, https://cigale.lam.fr/) simulator that produces simulations of galaxy spectrum (do not ask for details). We can change one of the main parameters (SFH) and produce several datasets. In this, X and Y corresponds to the profile of the galaxy, and we split the profile to get context/test. This has the advantage of having a real dataset available as well, but it might be a little complex to make work.
Another possibility, very similar, is to work on meta review: several regression problems supposed to be on the same model, that produce different results. I found an article on the subject (https://www.sciencedirect.com/science/article/abs/pii/S0167947309001765), but the data is not available simple.
Finally, it is possible to imagine a 3-level hierarchical model: \alpha -> \beta -> \mu -> x, where each point of the regression would be on [\mu -> x] (some x serve as input, the other as output), the context are some \mu's (which is why we need an additional level). This is very artificial (especially the assumption that everything is iid) but it's the only case where the dimensions of the input are exchangeable.
Just to be clear, the ideal example is a regression problem that has:
The asteroid example is not feasible, the code is in Julia and would require to link both (I don't think it's possible as it requires a lot of things on the cluster). I propose to work on SPDE (https://www.scitepress.org/papers/2007/20647/20647.pdf) to model blurring of images. There's a lot of possibilities in particular with higher dimension, small datasets (we generate images from a similar process, eg markov field) and we can have invariances
TODO: