FormingWorlds / PROTEUS

Coupled atmosphere-interior framework to simulate the temporal evolution of rocky planets.
https://fwl-proteus.readthedocs.io
Apache License 2.0
11 stars 1 forks source link

PROTEUS grid-search #204

Open timlichtenberg opened 5 days ago

timlichtenberg commented 5 days ago

For moving towards an inverse method of PROTEUS sometime down the road, we need to consider a computationally feasible approach to run many models to fit a given set of observations.

To give an example of the problem: Let's assume a given exoplanet has the following known/observed parameters with uncertainties: stellar age, orbital distance, planet radius, planet mass, transmission/emission spectrum. Given these parameters, we would like to compute the best-fitting PROTEUS models over a set of input parameters, and then compute a goodness-of-fit metric. This is essentially the description of an atmospheric retrieval, only that PROTEUS simulations are way too computationally expensive as to perform 100k+ simulations.

I am not certain yet what is the best strategy to approach this problem. Here are a few that have some opportunities and drawbacks:

nichollsh commented 4 days ago

I agree that this would be incredibly powerful. I can imagine that running an MCMC (or similar method) would be tricky because of the slow runtimes. When we are ready to look into this, maybe we could involve someone who has experience doing retrievals with large models?

nichollsh commented 4 days ago

The ML paper you cited is interesting - they ran 50k simulations to train the model. I am finding that a grid of 22 simulations takes about 14 hours to run (on 22 threads). If we scaled this to 50k simulations on 256 threads this would take 50000*14/256 = 114 days. We could of course speed this up by reducing the resolution, etc.

timlichtenberg commented 4 days ago

I believe they need fewer simulations than a "normal" Bayesian model, which is one of their selling points. Nevertheless, even 100k simulations are not impossible when using a large-scale computing facility. We can and should do this sometime in the next year to achieve a large simulation grid, once the current plans with aragog and zephyrus are done. Cosmology solves this problem by running updated large-scale forward models every few years with high-performance codes (e.g. TNG project) and then using these models to train machine learning on them. This is a way to go, but if we can find an algorithm that enables running highly specialised simulations to compute the Bayesian evidence directly for a single planet on ~week(s) timescale, this would be preferable I think.