Rewriting the Habib Likelihood Equation

mileslucas commented 5 years ago

@iancze @gully I would love some help on this.

So the Habib likelihood function is great for comparing fitting the GP hyperparameters to the spectra, but I've been wondering if we can rewrite from the perspective of tensors.

To start, instead of representing Sig_w as a block diagonal, why not just use a (m, M, M) tensor? (M is num input params, m is num eigenspectra- following nomenclature from literature/existing code)

Then, when we look at Phi, why not represent it as a tensor with shape (M, Npix, m) or something so that when matrix multiplied with a sample of w will create a tensor with shape (M, Npix) which should also be the shape of the fluxes F instead of concatenating them all together.

I fiddled around the above shapes trying to work through the appendix and match the shapes of everything but I fell short when trying to get the MLE of w- instead of getting shape (m,) I always had a rank 2 tensor.

If we can figure out a way to rework the Habib implementation from the view of tensors I think our code will be much more elegant and easy to parse, along with being more easily vectorized using PyTorch or similar.

iancze commented 5 years ago

I do think it's possible to rewrite the Habib et al. likelihood from the product of tensors. And to be honest, it would probably make the whole thing a bit easier to conceptualize what's going on. The fact that we have to stack the entire spectral library end to end into a giant 1D vector (script F), makes it a little difficult to keep track of what's going on. If you have a way to implement this in PyTorch /Tensorflow, then I'm all for rewriting F as an (M, Npix) matrix and rewriting Phi as something (M, Npix, m). I don't think it matters all too much what dimensionality we have things in, as long as at the end of the day when we are training the emulator we are calculating the likelihood via an equation equivalent to Eqn 45 (which is built from Eqn 42, 43, 44, and 46).

Can you explain your comment about Sig_w (Eqn 52, v2 paper) being a block diagonal matrix? If I'm remembering everything correctly it should just be an (m x m) dense matrix that describes the covariance between eigenspectra weights (m is the number of eigenspectra). Similarly, mu_w (Eqn 51) is just a length m vector. These quantities should only come into play after the emulator is already trained and we're using it to query spectra in the process of a stellar parameter fit.

mileslucas commented 5 years ago

I did a little more reading in the Habib paper and really all it is is a multivariate normal likelihood. The tricks come in by reducing the dimensionality and by creating w_hat. In my short brainstorming I was having trouble understanding how to create an MLE for w given residuals R = F-Phi * w with shape (M, Npix) and coming out with a vector of length m.

For the covariance matrix, I'm referring to equation A12/A13 in the appendix of v2 paper. instead of using a covariance matrix of shape (m, M) it uses a block diagonal of the m covariance matrices (from the GP kernel) to create a (Mm, Mm) sized matrix for use in the multivariate normal to sample the M grid weights from.

iancze commented 5 years ago

Ok, great. The dimensionality augmentation and reductions make the whole design of the emulator more complicated than it needs to be. But yes, at the end of the day it's just a (giant) multivariate normal likelihood calculating how well the reconstructed spectra (via the emulator) match the input spectral library.

Re equation numbers, sorry this is a little confusing because of the way emulateapj c.a. 2015 and IOP formatted appendix equations. In the ApJ proofed edition, the equation numbers continue through, whereas in the v1 and v2 versions on arxiv they start again with A.1, etc. It seems like you're actually talking about Eqn 31 in the ApJ version, so sorry for the confusion. For this matrix, yes, you could write in tensor form if it helps simplify the math.

mileslucas commented 5 years ago

So at the end of the day, we can seemingly easily rewrite A.14 (arxiv v2) using tensor notation. If we do that, is it worth it to get the MLE of w and reduce down the dimensionality as the paper does? If so, then how do we do that?

Starfish-develop / Starfish

Rewriting the Habib Likelihood Equation #119