bd-j / prospector

Python code for Stellar Population Inference from Spectra and SEDs
http://prospect.readthedocs.io
MIT License
153 stars 71 forks source link

Slowdown when fitting fBHB #187

Closed RGleis closed 3 years ago

RGleis commented 4 years ago

I realized in past test runs I had forgotten to pass the model parameter for describing the fraction of blue horizontal branch stars (fBHB) in my stellar population so it had been fixed at the default of zero. Now that I'm actually fitting the fBHB parameter rather than just mass, logzsol, and tage (as in the demo) my runs have slowed drastically --- from ~40 minutes for 17 object fits to taking days to do one object. This is without changing the number of emcee walkers or iterations. I also added in a spectrum (previously I only had photometry) but there was a test run in between with just the fBHB changed which seemed incredibly slow as well.

Why do you think this is happening? If this is unavoidable I can submit my fits to a computing cluster but it seems strange enough that there may be something weird going on. It doesn't throw any exceptions and it does show that it's progressing, just very slowly.

RGleis commented 4 years ago

These are my settings for the fBHB parameter.

fbhb_param = {"name": "fbhb", "N": 1, "isfree": True, "init": 0.1, "prior": priors.TopHat(mini=0.0, maxi=0.5), "init_disp": 0.05, "disp_floor": 0.01, "units": "unitless fraction", }

I saw in another issue that the MIST isochrones can be slow when changing the metallicity but I was already fitting metallicity and I've been using Padova isochrones so I can't really see that being the problem. I did a drastically reduced test run (16 walkers, 100 iterations) and it completed one source after 4 hours.

bd-j commented 4 years ago

There are a set of parameters -- fsps.StellarPopulation().params.ssp_params -- that if changed trigger a regeneration of the SSPs (as opposed to simply changing how they are combined). The fbhb parameter is one of these, since it modifies the isochrones. There is nothing to do to mitigate this. Note that if you switch to MIST the runtime will likely increase much more, since those SSPs take even longer to generate.

bd-j commented 4 years ago

you might consider conducting multiple fits with different fixed values of fbhb (i.e., a grid in fbhb) and then combining the posteriors, normalized by the evidence from dynesty.

RGleis commented 4 years ago

Can you elaborate on what you mean by combining the posteriors normalized by the evidence from dynesty? As I understand it, the evidence is the model-independent factor in the denominator of Bayes theorem but isn't actually involved in an MCMC run. My interpretation is that the prior changes between the runs with different fbhb and dynesty provides a tool to calculate the evidence to correct for this so the posteriors can be fairly compared. Is this correct or am I missing something? Does it mean I need to do the fit with dynesty rather than emcee?

bd-j commented 4 years ago

Yes, that's what I meant, and yes, you'd need dynesty fits rather than emcee, since the posterior PDFs from the latter are unnormalized, and can't be easily compared in absolute terms once the model (not necessarily even the priors) has changed (by changing fBHB.) The dynesty docs or paper may provide useful info. To be honest, I've never done this before, but I think in principle it should be possible. tagging @joshspeagle who may have thoughts or suggestions.

joshspeagle commented 4 years ago

Yes, in principle this should be possible. The terminology would be “Bayesian Model Averaging”, and you’d just be assuming that each model (with fixed fbhb) should be weighted according to the the expectation value of the likelihood of that model averaged over the prior (i.e. the evidence). This would implicitly be marginalizing over fbhb.

bd-j commented 4 years ago

Thanks! And then the spacing of the grid in fbhb would constitute an implicit prior on that parameter, yes?

joshspeagle commented 4 years ago

I was assuming the spacing would be uniform but yea — I guess so! You could also just impose a prior weighting scheme too.

RGleis commented 4 years ago

Thanks, this gives me a good jumping-off point. I'll leave this issue open for now in case anything else comes up.

RGleis commented 3 years ago

I've been trying different things for this and I've found that the dynesty runs fail on some sources with some fBHB values. They go through the initial dynesty run then hang at the end of the final run (typically iter >5000). There's no exception thrown and as far as I can tell it doesn't seem to be exceeding the resources of the server. It doesn't seem like a problem with the data either since it occurs for some fBHB values and not others, even on the same object. Do you have any ideas?

As for merging the ones that finished, my thoughts are to manually edit the ['sampling']['chain'] and ['bestfit']['parameter'] fields to append the fBHB in order to try to fool dynesty.utils.merge_runs(result_list) so hopefully it will think that they are just parallel runs of the same prior so I can merge them as described here https://dynesty.readthedocs.io/en/latest/quickstart.html#combining-runs. What do you think of this plan?

bd-j commented 3 years ago

Hmm, there was a problem with excessive memory usage causing runs to fail but that was fixed a few months ago (4d62832c3). Otherwise I'm not sure. Are you fitting spectroscopic data?

I don't know enough about how merge_runs works to be sure, but that might well work. bat signal for @joshspeagle (Note that "bestfit" is not important for dynesty; and to fool dynesty you'll need to create a dictionary that renames some of the vectors in "sampling": "lnprobability" -> "logl", "chain" -> "samples", and I think you need the ["sampling"]["logz"] array too.

RGleis commented 3 years ago

Yes, I'm fitting spectra.

joshspeagle commented 3 years ago

in order to try to fool dynesty.utils.merge_runs(result_list) so hopefully it will think that they are just parallel runs of the same prior

I would discourage you from trying this since I'm not 100% sure it would work. I would instead recommend just using resample_equal to get a new set of samples for each run, weight them proportional to the p(fBHB) value from the prior for that run, and then just combine the weighted samples from all the runs as just a giant set of weighted samples.

bd-j commented 3 years ago

For the stalling issue, a couple things. Do you see the efficiency (as listed in the progress line) dropping to very low values? Is the hang actually at the "end", i.e. has the stopping criterion been satisfied and the problem is somehow with final computations? Is it in the dynamic part (batch > 0) or the initial pass? You might try setting a maximum number of model calls or sampling iterations by adding the nested_maxiter_call or nested_maxiter_batch argument and setting it to something like int(5e6) (or 3000 for number of iterations). For the objects that stall only on some fBHB values, do the fits look reasonable for the fBHB values where they don't stall? Or are they pegged against the edge of some prior?

Some possible mitigation strategies: make nested_posterior_thresh a larger number, inflate the uncertainties on the spectra, make sure all possible bad pixels (residual sky emission, NsD ISM absorption, etc.) are masked. Beyond that you might need a spectroscopic noise jitter term or spectroscopic outlier model, but try the others first.

RGleis commented 3 years ago

Here are my dynesty run params

run_params = {'verbose': True, 'debug': True,

dynesty Fitter parameters

          'nested_bound': 'multi',  # bounding method
          'nested_sample': 'unif',  # sampling method
          'nested_nlive_init': 100,
          'nested_nlive_batch': 100,
          'nested_bootstrap': 0,
          'nested_dlogz_init': 0.05,
          'nested_weight_kwargs': {"pfrac": 1.0},
          }

run_params["nested_posterior_thresh"] = 0.05 run_params["nested_maxcall"] = int(1e6) run_params["nested_maxiter_call"] = int(7e5)

Here's the output from the last frozen one.

<my home directory>/.local/lib/python3.8/site-packages/prospect-0.4.0-py3.8.egg/prospect/models/priors.py:104: RuntimeWarning: divide by zero encountered in log iter: 5637 | batch: 0 | nc: 1 | ncall: 12821 | eff(%): 43.967 | logz: -3763882.020 +/- 0.996 | dlogz: 0.000 > 0.050 done dynesty (initial) in 155.2293312549591s iter: 6382 | batch: 2 | nc: 1 | ncall: 13936 | eff(%): 45.336 | loglstar: -3763834.355 < -3763828.209 < -3763830.886 | logz: -3763881.842 +/- 0.992 | stop: 1.758

In contrast, here's the last one that succeeded

<my home directory>/.local/lib/python3.8/site-packages/prospect-0.4.0-py3.8.egg/prospect/models/priors.py:104: RuntimeWarning: divide by zero encountered in log iter: 5684 | batch: 0 | nc: 1 | ncall: 12131 | eff(%): 46.855 | logz: -3746314.831 +/- 1.000 | dlogz: 0.000 > 0.050 done dynesty (initial) in 149.3077552318573s iter: 7854 | batch: 5 | nc: 3 | ncall: 15092 | eff(%): 50.889 | loglstar: -3746268.283 < -3746262.956 < -3746264.188 | logz: -3746314.553 +/- 0.988 | stop: 1.163 done dynesty (dynamic) in 58.89828610420227s

They are on the same source, with spectra included and recale_spectrum on (by the way is the rescaling factor stored in the output? I can't find it for making my plots) with different fbhb values (0.1 and 0.2 respectively, I'm trying to figure out the problem on reduced runs of just fbhb=0, 0.1, and 0.20) and I think I've masked out all the bad values from my spectra. The successful ones on this object do seem to be bumping up against my age prior and I'm suspicious of the logzsol (it's ~0.18 and the object's supposed to be a globular cluster) but my other objects seem fine and also are less prone to freezing.

It's worth noting that we have a slightly negative redshift for this object in some archival spectral redshift data while all the others are positive. Maybe that's relevant? The fbhb = 0 run for this object required about 6x10^5 of its maximum 7x10^5 iterations. For completeness, that one's output was

<my home directory>/.local/lib/python3.8/site-packages/prospect-0.4.0-py3.8.egg/prospect/models/priors.py:104: RuntimeWarning: divide by zero encountered in log iter: 5421 | batch: 0 | nc: 1 | ncall: 12030 | eff(%): 45.062 | logz: -3728262.062 +/- 0.972 | dlogz: 0.000 > 0.050 done dynesty (initial) in 134.1306962966919s iter: 666470 | batch: 3 | nc: 1 | ncall: 1000116 | eff(%): 66.635 | loglstar: -3728217.568 < -3728213.324 < -3728213.090 | logz: -3728261.898 +/- 0.964 | stop: 1.418 done dynesty (dynamic) in 8021.169002532959s

On Fri, Jul 31, 2020 at 6:27 PM Ben Johnson notifications@github.com wrote:

For the stalling issue, a couple things. Do you see the efficiency (as listed in the progress line) dropping to very low values? Is the hang actually at the "end", i.e. has the stopping criterion been satisfied and the problem is somehow with final computations? Is it in the dynamic part (batch > 0) or the initial pass? You might try setting a maximum number of model calls or sampling iterations by adding the nested_maxiter_call or nested_maxiter_batch argument and setting it to something like int(5e6) (or 3000 for number of iterations). For the objects that stall only on some fBHB values, do the fits look reasonable for the fBHB values where they don't stall? Or are they pegged against the edge of some prior?

Some possible mitigation strategies: make nested_posterior_thresh a larger number, inflate the uncertainties on the spectra, make sure all possible bad pixels (residual sky emission, NsD ISM absorption, etc.) are masked. Beyond that you might need a spectroscopic noise jitter term or spectroscopic outlier model, but try the others first.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/bd-j/prospector/issues/187#issuecomment-667287273, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABSN65DYOS72D2NXU6PSEXDR6MEHNANCNFSM4OZ6KJDA .

bd-j commented 3 years ago

something seems off; the efficiency is very high, and 6e5 iterations is very large.

You should probably be fitting for redshift, and using a lumdist parameter for the luminosity distance (so that it is not taken from the redshift + cosmology, which might be undefined for negative z?)

The rescale_spectrum parameter simply scales the observed spectrum (and uncertainties) to have a median = 1; the factor is stored in the obs dictionary as "rescale" since it is fundamentally an operation on the data. I would actually recommend against using it. Are you also fitting photometry? Are you using PolySedModel or PolySpecModel, both of which can optimize/fit for a normalization factor? Are your spectra flux calibrated? Are you handling the lines spread function?

Can you send the model description (i.e. the result of print(model))? Feel free to email as well if that's better. See also https://github.com/bd-j/GCPro for some tips on fitting GCs with prospector (this also uses the more modern interaction via command line options + parser instead of an explicitly edited run_params dictionary.)

bd-j commented 3 years ago

assuming this is resolve, please repopen if there are still problems.

joel-roediger commented 1 year ago

@joshspeagle I am trying to implement the solution that you recommended above for combining the results of Prospector runs with differing (fixed) values of the parameter fbhb. I am not sure what you meant above by "... weight them proportional to the p(fBHB) value from the prior for that run, then just combine the weighted samples from all the runs as just a giant set of weighted samples". Am I to weight the likelihoods of the samples or something else? And would it be straightforward to use the plotting tools offered in dynesty on the combined set of weighted samples?