HITS-AIN / Spherinator

https://space.h-its.org
Apache License 2.0
4 stars 4 forks source link

Full prototype description #29

Closed sebastian-tg closed 12 months ago

sebastian-tg commented 1 year ago

Statement of the problem

Cosmological hydrodynamical simulations are excellent numerical laboratories to investigate the formation of galaxies and large scale structure. They provide a highly detailed realization of structures in the universe across a vast range of spatial and temporal scales (7 orders of magnitude in dynamic range in space and time). The simulation outputs are information-rich (e.g. 6D phase space + density, gas temperature and up to 100 isotope abundances for each gas/star particle + many other fields for each mass component). This complexity is overwhelming for humans to examine and understand.

The problem was traditionally approached by collapsing the rich multidimensional data onto a simplified 0D representation (single scalars) of galaxy/halo properties (e.g. stellar mass, morphology, half-light radius, mean surface brightness, bulge/disk ratio, maximum circular velocity). This approach was inspired by the data scarcity of observational astronomy, where it is much more efficient to measure relations between global galaxy properties (e.g. the Hubble diagram, the galaxy main sequence, the Tully-Fisher relation). The same is true for the ‘extrinsic’ causes of these properties, like DM halo shape, environment, or assembly history. For more fine-grained analysis, these properties are expressed in 1D (density/mass profiles). The collapse of simulation data from >3D to 0/1D wastes most of the detailed structure information and removes valuable insight into the physics behind galaxy and structure formation.

Proposed solution

Dimensionality reduction is a well developed field of machine learning that aims to create compact representations of complex high-dimensional data that efficiently capture the most information without the need for labels with the goal of allowing easier visualization and interpretation. Instead of collapsing structures to 0/1D along arbitrary projections guided by human intuition, we propose to let an unsupervised dimensionality reduction ML model find the most efficient representation of simulated structures (and in particular galaxies) in a latent space that has low enough dimensionality that it can be inspected easily and interactively, even for the largest cosmological simulations. The galaxies in this space can be painted with the traditional 0D properties to aid human interpretation and to enable knowledge discovery. Furthermore, they can painted using latent representations of the extrinsic variables (like formation history or environment) to find and investigate the causal drivers of observed galaxy properties. The visualization is lightweight such that it can be run from any laptop via a webserver, and it provides functionality to interactively inspect and select subsamples of data for local analysis.

Design and user interaction

BerndDoser commented 12 months ago

Instead of using GitHub issues, I recommend using a markup file in the space-dev repository or Overleaf directly for collaborating on text.