Asterics2020-Obelics / AstroLearning

Transversal astronomy, astrophysics and astroparticle machine learning group
6 stars 2 forks source link

Good practice protocol for ML on simulated data #1

Open vuillaut opened 6 years ago

vuillaut commented 6 years ago

Simulated data is a big part of large physics experiments (not limited to astronomy). In these experiments, it is often impossible to calibrate the instrument or simply know its response function to a generated signal. For example, in Imaging Atmospheric Cherenkov Telescopes, the observed phenomenon being an atmospheric shower generated by a high-energy particle entering the atmosphere, we cannot generate it experimentally.

For machine learning applications, this can generate some specifics problems when training on simulated data and transferring the training on real data. This discussion intends to collect, share and discuss these potentials issues, and possible ways to overcome them.

vuillaut commented 6 years ago

This issue has now been demonstrated in Shilon et al 2018, showing an important drop in performances when testing the classification on real data after training on simulated ones. They do not propose any solution though.

Adbhavna1369 commented 3 years ago

Thankyou for providing the resources! :)