MultimodalUniverse / MultimodalUniverse

Large-Scale Multimodal Dataset of Astronomical Data
https://huggingface.co/MultimodalUniverse
MIT License
32 stars 14 forks source link

Real Type Ia Supernova Baselines #70

Open benboyd97 opened 3 months ago

benboyd97 commented 3 months ago

We've been thinking about how we might tackle the problem of the real type Ia supernova datasets. They are pretty small datasets with 200-300 lightcurves and different filters for each survey, which makes directly applying a neural network a bit tricky. An avenue that we've spoken about is to use the sncosmo traditional light curve template fitter which should be able to give us a photometric redshift estimation for each SN. @ado8 is going to have a go at implementing this to see if the fitter is compatible with all the different survey filters we have.

After we have a traditional baseline for photometric redshift estimation we can have a comparison with ML approach like a CNN to see how is performs despite the small amount of data which @David-Chemaly is going to have a go at.

We spoke about combining survey datasets into light-curve summary statistics provided by the sncosmo fitter to then do photometric redshift estimation across the surveys with a random forest. This is something we can do but there are concerns that this is a little circular and will just reproduce the photometric redshift estimate given by the template fitter - however we can test this later down the line!

Another avenue for these datasets that we spoke about besides photometric redshift baselines is the idea of inpainting to see if we can predicted light-curve points. Although this will still suffer from the problems of small datasets, a Gaussian Process might be a good avenue for this that maybe @erinhay who has done supernova lightcurve GPs can provide insight on.

We still think that these small datasets are useful for AstroPile since they show difficulties of working with real datasets. The reader could take inspiration on how simulated PLAsTiCC lightcurves provided by @helenqu are analysed to think about how to solve problem for inhomogeneous smaller samples.

If anyone has any thoughts on these baselines let us know!

helenqu commented 3 months ago

i think the traditional baseline (SALT2 fit?) is a good idea to include, but probably important to remember that SALT2 fitters are "trained" on external datasets, like JLA, to produce fits for e.g. the CfA dataset, which makes it not comparable to an ML approach that trains on 80% of the CfA dataset to produce predictions on the remaining 20%

benboyd97 commented 3 months ago

Yeah I totally agree that we should be transparent that the SALT template approach is using templates/physics knowledge from elsewhere rather than only the data given