ACCarnall / bagpipes

Bagpipes is a state of the art code for generating realistic model galaxy spectra and fitting these to spectroscopic and photometric observations. Users should install with pip, not by cloning the repository.
http://bagpipes.readthedocs.io
GNU General Public License v3.0
78 stars 40 forks source link

error when using get_advanced_quantities() #90

Open jiyoungchoe1 opened 1 week ago

jiyoungchoe1 commented 1 week ago

Hi,

I'm one of the beneficiaries using your code. Thanks for developing this code and the easy-to-understand manual! I've been trying to load already fitted .h5 file and to get advanced quantities like 'spectrum_full' or 'chisq_phot'. Then there's this error message that appears like below when I use fit.posterior.get_advanced_quantities().


AttributeError Traceback (most recent call last) Cell In[3], [line 99] [97] galaxy = pipes.galaxy('fitting_test', load_func, spectrum_exists=False, filt_list=filter_lists) [98] fit = pipes.fit(galaxy, fit_instructions, run=posterior_dir, n_posterior=1000) ---> [99] fit.posterior.get_advanced_quantities()

File ~/anaconda3/lib/python3.8/site-packages/bagpipes/fitting/posterior.py:202, in posterior.get_advanced_quantities(self) [199] self.fitted_model.lnlike(param) [201] if self.galaxy.photometry_exists: --> [202] self.samples["chisq_phot"][i] = self.fitted_model.chisq_phot

AttributeError: 'fitted_model' object has no attribute 'chisq_phot'


The thing is that this error happens totally randomly for some already fitted file I have. I somehow figured out why this happened and fixed it, but I'm not sure this is the exact reason and proper solution.

Here's why this error happens: Let's say I'm reading already fitted file using bagpipes 'fitting_test.h5' and it contains 1000 posteriors. When bagpipes unpacks and reads this file, (I believe) it reads each posterior in random order. If the model star formation histroy that corresponds to the first posterior has no problem, which means no stars formed before the Big Bang so that 'self.model_galaxy.sfh.unphysical = False', then 'self.fitted_model.chisq_phot' can be properly defined as below in fitted_model.py


    # Return zero likelihood if SFH is older than the universe.
    if self.model_galaxy.sfh.unphysical:
        return -9.99*10**99

    lnlike = 0.

    if self.galaxy.spectrum_exists and self.galaxy.index_list is None:
        lnlike += self._lnlike_spec()

    if self.galaxy.photometry_exists:
        lnlike += self._lnlike_phot()

where the 'self_lnlike_phot()' part defines the 'self.fitted_model.chisq_phot'. However, if the first posterior has a star formation history before the Big Bang, 'self.model_galaxy.sfh.unphysical' is defined as True, so 'self.fitted_model.chisq_phot' is not defined. Therefore, there is an error happening at 'self.samples["chisq_phot"][i] = self.fitted_model.chisq_phot' in posterior.py

I fixed this issue by adding 'self.chisq_phot = np.nan' above 'return -9.99*10**99' to rule out this posterior for analyzing, since spectrum_full is defined as np.zeros_like(self.wavelengths) for the posterior that corresponds to 'self.model_galaxy.sfh.unphysical = False'.

I'm not fully understand how bagpipes works yet, so please point out any errors in what I've said. Thanks for reading this, and I hope this helps on developing bagpipes.

ACCarnall commented 1 week ago

Hi,

Hmm, this is a really interesting issue, thanks for reporting it.

To be honest, I'm a little surprised that a sample point with an unphysical SFH made it into your posterior in the first place. Are a high fraction of the points in the posterior the same? It could be the case that there's actually some problem with either your model or data which is causing all the points in the posterior to come up as unphysical? Are you getting sensible looking output plots? Which sampling algorithm are you using, Nautilus or MultiNest?

I do agree however that the line you included should probably be there anyway, just to prevent old chi-squared values being associated with unphysical points in the posterior in case any do make it through. Would you like to submit a pull request to implement this, or would you prefer I just make the change?

Cheers, Adam

jiyoungchoe1 commented 6 days ago

Thanks for your quick reply!

First, I want to say sorry for giving you the wrong information about flagging sfh.unphysical as True. I didn't read the code carefully enough. There were actually two cases for the problem I had, and both cases flag sfh.unphysical as True for some posteriors.

Case 1 : using constant star formation history model with wrong prior range. I tried to fit photometric data with the following fit instruction.


constant = {} constant["age_min"] = (0,1) constant["age_max"] = (0,1) constant["massformed"] = (1,13) constant["metallicity"] = (0.001, 1) constant["massformed_prior"] = "log_10" constant["metallicity_prior"] = "log_10"

fit_instructions = {}
fit_instructions["redshift"] = 6. fit_instructions["constant"] = constant


Since I fixed the redshift as 6 (which corresponds to a cosmic age of 0.917 Gyr) while I set the prior range for 'age_max' up to 1 Gyr, there could be stars formed before the Big Bang for some posterior, which will flag the 'sfh.unphysical = True'.

Case 2 : using double power law star formation history model with wrong prior range. Again, I tried to fit photometric data with the following fit instruction.


dblplaw = {} dblplaw["tau"] = (0., 1.) dblplaw["alpha"] = (0.01, 1000.) dblplaw["beta"] = (0.01, 1000.) dblplaw["alpha_prior"] = "log_10" dblplaw["beta_prior"] = "log_10" dblplaw["massformed"] = (1,13) dblplaw["metallicity"] = (0.001, 1) dblplaw["massformed_prior"] = "log_10" dblplaw["metallicity_prior"] = "log_10"

fit_instructions = {}
fit_instructions["redshift"] = 6. fit_instructions["constant"] = constant


Similarly, redshift is fixed as 6, while I set the prior range for 'tau' up to 1 Gyr. This time there could be a posterior that corresponds to a tau greater than the age of the universe at the observed redshift, which will also flag the 'sfh.unphysical = True'.

I was expecting bagpipes to limit this prior range or stop the code if there's any problem when performing fitting, which I should have confirmed. Totally my bad. Still, I'm confused about the comment you wrote in Example 4 where the prior range for tau is defined, which says, "In practice, the code automatically stops this exceeding the age of the universe at the observed redshift.". I would be grateful if you could explain what this means in detail.

Here are my answers to your questions: Q. Are a high fraction of the points in the posterior the same? It could be the case that there's actually some problem with either your model or data which is causing all the points in the posterior to come up as unphysical? A. It was my model that has the problem, as I mentioned above. The input data itself is not the problem but kind of related to the fraction of posterior, which is flagged as unphysical. For example, one of my input photometry has extremely blue SED since most of the stars are younger than 5 Myr. When I use double power law SFH model, there are 1500 posteriors that are flagged as unphysical out of 1900. In order to mimic the input SED, high value of tau is needed to have a rich population of young stars, which is highly likely for tau to exceed the age of the universe at the observed redshift. Except this one, most of the data have less than 2% of a fraction of posterior that is flagged as unphysical.

Q. Are you getting sensible looking output plots? A. Since most of the data have less than 2% of a fraction of posterior that is flagged as unphysical, most of the output plots are fine. However, when fraction of unphysical posterior is high enough, some plots made with bagpipes are hard to analyze. The 'plot_spectrum_posterior' is one of the functions that is heavily affected by this. Every 'spectrum_full' of unphysical posterior is defined as np.zeros_like(self.wavelengths), so that the 16th percentile line of the spectrum is attached to the x-axis.

Q. Which sampling algorithm are you using, Nautilus or MultiNest? A. I'm using MultiNest.

Q. Would you like to submit a pull request to implement this, or would you prefer I just make the change? A. I think it's better to leave this to you.

I'm sorry again for the confusion and thanks for reading this!

ACCarnall commented 2 days ago

The confusion here I think is that (unlike the constant age parameter) the tau parameter for the double power law model starts at tau=0 at the Big Bang and increases forwards in cosmic time. When you allow a range in tau that reaches larger values than the age of the Universe at the observed redshift of the galaxy the code will automatically cap the tau parameter at the age of the Universe at z_obs. This is a separate mechanism to the unphysical SFH flag, since there's nothing unphysical about double power law models that peak after the time of observation, they're just very hard to distinguish between as they're all basically just a smooth rise in SFR, so I decided not to fit them.