Explore a non-parametrical version of the wf-psf model

tobias-liaudat commented 3 years ago

Try out if we can add an additive non-parametric layer to the OPD.

Later we could see if it is worth to add a non-parametric layer at the pixel level so that it can cope with effects happening at the pixel level. Nevertheless, this would mean that we should have a dataset that include these effects.

How can we represent the variability of the PSF?

At first, we can go for a classic matrix factorisation in a RCA-style, a linear model. Learning a feature matrix (S) and a variability matrix (A) that contains the varying weights of each feature.

How to validate this approach?

Not that simple given the fact that the PSF simulation tool is generating PSFs following polynomial variations of the Zernike coefficients. One way to go would be to simulate using a high number of Zernike polynomials (ex 45) and then set a limited number of Zernike polynomials in the parametric part of our wf-psf model (ex 10). Then evaluate the performance of the parametric model (only estimating Zernike coefficients) and the semi-parametric model (estimating the Zernike coefficients and the matrix factorisation).

tobias-liaudat commented 3 years ago

To Do

[x] First semi-parametric model using the factorisation S alpha Pi (where we use a polynomial variation (for simplicity right now)) commit here
[x] Generate complex dataset, ie 45 Zernike order. commit here
[x] Try to estimate it with the parametric PSF model with a lower order, ie 15.
[x] Evaluate the errors made. commit here
[x] See if the non-parametric layer is able to improve the results in this set-up.

For this I might need to define a custom training. First optimise the parametric part with the non-parametric part set to zero. At this point initialise the non-parametric model to a non-zero value. Then fix the parametric part and optimise over the non-parametric part.

To consider:

Add a regularisation to the non parametric part, ie L1 norm to alpha and a denoising strategy to S columns (eigenOPDs).

Next step

After this done, the next step is to define a non-parametric model that does not have polynomial variation. This would imply that we need to handle the batch indexes. Depending on which indexes the batch uses we are going to optimise over certain columns of the A matrix (MCCD nomenclature).

tobias-liaudat commented 3 years ago

First results

Trying to model a complex dataset, Zernike order 45, with a lower number of Zernike polynomials. Both trained in 20 epochs.

Dataset z=45, model z=15

Dataset z=45, model z=45

Quantitative results

Model z order	train RMSE	test RMSE
15	2.7169e-04	2.5854e-04
45	2.0343e-04	2.0001e-04

tobias-liaudat commented 3 years ago

First results using a naive non-parametric model

Using polynomial variations right now for the eigenOPDs.

Details of the model: Parametric: Adam , lr=1e-2, epochs=20, 15 Zernike order, max polynomial variation degree is 2 Non-parametric: Adam , lr=1.0, epochs=100, max polynomial variation degree is 3

Details of the dataset: 45 Zernike order, max polynomial variation degree is 2

Quantitative

Model z order	train RMSE	test RMSE
15	2.7169e-04	2.5854e-04
45	2.0343e-04	2.0001e-04
15 + nonparam	9.5326e-05	9.3800e-05

Qualitative

Learned eigenOPDS

Learned alpha matrix

Loss function of parametric estimation

Loss function of non-parametric estimation

tobias-liaudat commented 3 years ago

Before closing this issue I'll share more plots to analyse the first results comparing:

Ground truth model with Z=45
SemiParametric model with Z=15
Parametric model with Z=45

It's interesting to see that:

even if the resulting OPD from the semi-parametric model is not very close to the OPD from the ground truth PSF, the resulting pixel polychromatic PSFs are very close.
the pixel PSF decomposition between the parametric and the non-parametric part
even if the model used to estimate the PSF is the same on used to generate the data, the case for the parametric Z=45 model, the Zernike coefficients are far from being close to the ones of the ground truth model.

All the plots are for one random test star.