Mixture model mode - Githubissues

gully commented 8 years ago

The problem: Many astrophysical spectra are composed of the superposition of two or more individual spectra, for example unresolved multiple systems or starspots.

Suggested solution: The composite spectra can be modeled as a mixture model, M_mix = f M_B + (1-f) M_A, where f is a filling factor and M is a stellar model as defined in Czekala et al. 2015.
Depending on the application, the model components will share some of the same stellar parameters. For example if the composite spectra are from starspots, the mixture model should share the same log g, metallicity, vsini, and RV, but if composite spectra originates from multiple systems the mixture model should have different different values for the stellar parameters. Naturally, if the flux ratio is too low, the secondary component will not be detectable. But the MCMC machinery should deliver a robust upper limit to the flux ratio, f, in this case.
One could imagine a solution intermediate to vanilla Starfish and the full mixture model-- one mixture model component is held fixed, and the other component is varied.

Practical Considerations and Costs: There are some practical limits to searching for buried signals like this. Primaries or secondaries that span a very wide range in stellar properties will require a large spectral emulation volume. The spectral emulation procedure is computationally expensive for large parameter ranges, so it might make sense to chop up the spectral emulation into two islands surrounding the expected stellar properties for components A and B, if constraints exist. If these constraints do not exist, then the MCMC machinery might struggle to land on a unique global solution for the secondary, which is fine, but should be anticipated.

iancze commented 8 years ago

I do really like the idea of building forward models of composite spectra. There are of course a lot of existing tools already out there (e.g, TODCOR) that provide some of this functionality for multiple systems, if not the ability to fully explore forward models. As a test object, it might be easiest to start with a near equal mass ratio binary where the lines are clearly separated. This way you can use the same emulator and be certain that you know which lines belong to which star (equal S/N for each source). If you don't have an object like this, I can provide you with some spectra for DQ Tau (two M0's).

Moving to fainter objects, I'm worried that a single epoch of observations would either provide enough information to easily see the spectrum of the second source in superposition (if so, great), or that if it's very faint, you'll fall into a systematics-dominated trap, where uncertainties in the model spectrum may masquerade as possible detections. Maybe this is a parameter space that is worth exploring, but if the primary purpose is detection, then I am wondering if an approach like that of @kgullikson88 would more easily yield fruit.

gully commented 8 years ago

Great comments here, and I think I agree with the limitations you noted. In general, I think MCMC for blind detection will not work in this case-- there are too many systematics. Where I think the technique could shine (maybe) is when another method (e.g. visual ID, Kevin's method, TODCOR, etc) produce a detection, and one now seeks a refined measurement of the properties of the two components. The systematics dominated trap is the big catch though... A line-by-line chunking strategy could yield some resilience here, but I'm not sure.

In any case, I have implemented this mixture model and I've been experimenting with it on a composite spectrum (star spot model). The main problem I've had is that in going from 6 to 8 stellar parameters (+ Teff2 and logOmega2), I have made the sampler mixing much slower. This is one aspect of the curse of dimensionality. I've tried a few transformations of the variables (c/(1-c) :arrow_right: logOmega2, etc), but the problem persists.

When brainstorming ideas for enhancing the sampler mixing, I thought about the "Stretch Move" from the emcee algorithm. The emcee implementation uses parallel walkers, which wouldn't be useful for us since the code is already parallelized by spectral order. But in principle the algorithm can be used in serial as defined in Algorithm 2 of Dan's paper. My gut reaction is that this change would require more work than it's worth, and might not work with the Gibbs Sampling framework, causing all sorts of implementation messes. It might even be wrong? But any advice from experts here is appreciated. :pager: @dfm.

By the way, the heuristic work around now is simply to run the sampler for an extended burn-in period, then use the covariance matrix as the proposal distribution for a new sampler. In practice I find myself doing this multiple times, also with multiple initializations, and the results are just okayish.

dfm commented 8 years ago

Unfortunately it's not really possible to use an ensemble method inside a Gibbs framework even if it's not parallel. All of the walkers need to be sampling from the same underlying distribution and if you work it through it's hard to make that work inside of Gibbs.

davidwhogg commented 8 years ago

Yes; all my ideas in this arena are amazingly inefficient; ensembles and Gibbs not good. Interesting to think if there is a generalization of ensemble methods you might use, but this is a research project in sampling, not spectroscopy.

gully commented 8 years ago

Thanks y'all. In hindsight I see there are identical issues on this already here and here, so this problem must sound like a broken record... there's gotta be a joke about DJ-/emcee-ing in there somewhere :microphone:. Anyways, it makes sense why this problem keeps coming up though: the MH + Gibbs in strongly correlated high dimension is tricky without alternative machinery, i.e. emcee or stan.

One of the virtues of the current spectral inference framework is its extensibility, which in part means being able to add new physics to the model, which ultimately means adding more parameters and therefore dimensions. But when adding more dimensions makes sampling N-squared harder, the benefit of the framework is diminished.

Anyways, as a workaround I've tried something new. Why not get rid of the Gibbs sampling, operate on each order independently of the rest of the spectrum, and use emcee on the stellar and calibration parameters simultaneously?

I did this, and it seems to work much better. I went from Gibbs sampling with 6 stellar then 6 nuisance parameters to emcee sampling 12 total parameters. I don't see any reason why this is illegal, but I'm all ears on being proven wrong :oncoming_police_car:.

One might complain that you'll have to combine the set of stellar parameter estimates at the end in some heuristic way, losing the value of the distributions you worked so hard to get in the first place. True. Or your fidelity in flagging outlier spectral lines will be diminished. Also true. But in a systematics limited problem, an order-by-order approach provides an extra layer of resilience by being able to down-weight orders that have poor fits, or identify orders that have exceptional fits. It could be considered a first pass.

What to do with the set of stellar parameter estimates is problem-specific. For example, there could be astrophysically interesting trends in the stellar estimates as a function of wavelength. The Zeeman effect scales with wavelength. Or one could feed back information into a constrained full-spectrum fit, maybe in a hierarchical way.

As far as implementation, all I did was merge star.py and parallel.py, and hack apart all the parallelization and revert_Theta() stuff. I run update_Theta() and update_Phi() in serial and it all works fine.

model = SampleThetaPhi(debug=True)

model.initialize((0,0))

def lnprob_all(p):
    try:
        pars1 = ThetaParam(grid=p[0:3], vz=p[3], vsini=p[4], logOmega=p[5])
        model.update_Theta(pars1)
        # hard code npoly=3 (for fixc0 = True with npoly=4)
        pars2 = PhiParam(0, 0, True, p[6:9], p[9], p[10], p[11])
        model.update_Phi(pars2)
        lnp = model.evaluate()
        return lnp
    except C.ModelError:
          model.logger.debug("ModelError in stellar parameters, sending back -np.inf {}".format(p))
          return -np.inf

The current implementation is a stand-alone ~~hack~~ script. It's probably possible to subclass things in a clever way, i.e. DRY. Full code is here.

gully commented 8 years ago

(I'm still wrapping my head around whether my implementation is Kosher in the multi-threaded case.)

iancze commented 8 years ago

Hi Gully,

As I understand it, what you are doing now is fitting one order at a time, completely independently from everything else. In this case, instead of using a Gibbs step to alternate between calibration parameters and stellar parameters, you are just fitting everything with one emcee instance (let me know if this is incorrect). If that is the case, I don't see any problem with what you are doing now (you probably know that you need to run your MCMC longer). In fact, this is how I had originally coded things for the first few months of the project.

As you point out, though, the question is then what to do with the inevitably different and non-overlapping posteriors that you end up with in the end. I had switched over to the Gibbs framework because I eventually decided that I was most interested in what was the global estimate of Teff, logg, and Z... systematics be damned. By this framework, I was asserting that each order needed to have the same spectrum originating from the same star, and that Teff -> spectrum in exactly the same way for all wavelengths. The covariance kernels were designed to pick up the systematics and downweight their influence.

Of course, there is a whole other way of framing the problem that is equally interesting and addresses problems that we know exist. Since we know that the spectral models are not perfect (no one Teff fits all orders equally well, and there do exist trends w/ wavelength, like you mention), instead just see what parameter values fit best for each order. Then, at the end of the day do something fancy combining the posteriors from each order to identify discrepant regions and arrive at some robust estimate of the global stellar properties.

This is essentially the idea behind the chunking proposal, but instead of focusing on orders, focus on 2AA chunks. For FGK stars, this will quickly weed out a lot of the bogus, systematics-dominated lines, since these will give unrealistic values for the stellar parmeters. I haven't yet fully worked out the details of how to properly combine all of the posteriors in the end. I had originally envisioned doing each of these chunks completely independently from each other (the sampling and coding then becomes very fast). However, when I chatted w/ @dfm last fall about this, he cautioned that a simple "combine at the end" may not be what you want hierarchically, and that the proper sampling might not be so independent after all.

However, purely from a standpoint of "is there any region of parameter space that fits this spectral line?" I still think the chunking exercise is still very worthwhile to carry out, since we could see which lines are biasing Teff, logg, Z, etc.

gully commented 8 years ago

OK, thank you very much for the feedback, one and all. This conversation thread has been tremendously helpful in reflecting on the right thing to do, and looking ahead at big picture stuff.

I'm very excited about the line-by-line chunking feature. I'd be willing to help implement it, but I do not know where to begin with the hierarchical stuff. A nudge in that direction could help.

mileslucas commented 5 years ago

My updates to Starfish (currently on refactor/core-numpy will allow extremely straightforward mixtures.

from Starfish.models import SpectrumModel

param_dict_a = ...
param_dict_b = ...

M_a = SpectrumModel(..., **param_dict_a)
M_b = SpectrumModel(..., **param_dict_b)

flux_a, cov_a = M_a()
flux_b, cov_b = M_b()

final_flux = flux_a + flux_b
final_cov = # whatever is appropriate here

R = final_flux - data
# ... continue with MLE

One of the big goals of rewriting Starfish was to allow this API approach so that the science became more plug-and-playable. For instance, if I wanted a simple blackbody for a dust ring, all I would do is replace M_b in the above example with a Planck function.

iancze commented 5 years ago

This is really great. Simplifying the API in this manner goes a long way towards achieving the goal of a modular spectroscopic inference package!

Starfish-develop / Starfish

Mixture model mode #35