Extend non-parametric part to a matrix factorisation scheme

tobias-liaudat commented 3 years ago

I should go to a full matrix factorisation of the style NP = S A instead of NP=S Pi (where Pi are not learned but calculated from the position as polynomials.

For this I need to handle properly the indexes in the training as each observation corresponds to a column in A.

tobias-liaudat commented 3 years ago

To do:

[x] Build spatial_dictionary for the hybrid graph+polynomial spatial constraints (as in MCCD).
[x] Handling the batch dependent training with proper indexing. Done in commit
[x] Add L1 loss fo alpha to the global loss. We need to enforce the new spatial constraints and we do it by enforcing the sparsity of the alpha matrix on the spatial_dictionary. commit
[x] Add custom predict function. commit

Now the new prediction function is completely differentiable (not needed right now but could be useful for a side-project, if we want to incorporate knowledge on the galaxy images). It is based on a two-step interpolation:

Polynomial part: the interpolation is done using the natural position polynomial framework (as done in the polynomial-only flavour of the model)
Graph-constraint part: the interpolation is done using a RBF kernel interpolation (thin_plate) as was done in the original MCCD algorithm. The unique difference is that now we are not using the k nearest neighbours for the interpolation but we are using all the stars. This step is based on the tensorflow_addons tfa.image. interpolate_spline function.

Right now the L1 norm is only applied to the alpha_graph matrix as the alpha_poly matrix is already naturally sparse due to the identity initialisation and the way the poly_dic is built.

What is still needed to be done:

[ ] Investigate an "optimal" optimisation strategy.
[x] How should we choose the l1_rate parameter for the L1 regularisation loss.

tobias-liaudat commented 3 years ago

Now that the job is done and working. I need to do some sanity checks to make sure that everything is working fine :)

[x] Checkout performance of the new algorithm in the noisy stars varying SNR case.
[x] Compare the hybrid-MCCD flavour to the polynomial-only flavour of the semi-parametric wavefront model.

tobias-liaudat commented 3 years ago

Noiseless training stars

Results on the hybrid-MCCD flavour for two values of l1_rate. The colab notebook for the test l1_rate=0.0 with can be found in this commit.

To clarify, tr=train dataset, te=test dataset, pix= comparing the pixels value of the polychromatic PSFs, OPD= comparing the wavefront value.

Common parameters:

d_max = 2 # parametric part
d_max_nonparam = 3 # polynomial-constraint features
graph_features = 10 # Graph-constraint features

Both of them share the training cycle. Cycle_1:

l_rate_param=1e-2
l_rate_non_param=1.0,
n_epochs_param=20
n_epochs_non_param=140

Cycle_2:

l_rate_param=1e-2
l_rate_non_param=1.0,
n_epochs_param=30
n_epochs_non_param=140

Semi-parametric MCCD with`l1_rate=0.0`

model	te pix RMSE	te OPD RMSE
semi15_MCCD_cycle1	4.9919e-05	8.4276e-02
semi15_MCCD_cycle2	2.6158e-05	8.7374e-02

Semi-parametric MCCD with `l1_rate=1e-6`

model	te pix RMSE	te OPD RMSE
semi15_MCCD_cycle1	7.3274e-05	9.4550e-02
semi15_MCCD_cycle2	4.9498e-05	1.1070e-01

Semi-parametric polynomial

model	te pix RMSE	te OPD RMSE
semi15_poly_cycle1	3.4385e-05	1.1082e-01
semi15_poly_cycle2	1.7849e-05	1.0976e-01

Parametric model

model	te pix RMSE	te OPD RMSE
param45_cycle1	1.7352e-04	...
param45_cycle2	1.9846e-05	...

tobias-liaudat commented 3 years ago

Noisy training stars

Using stars with different SNR values. The SNR range is between 10 and 70, we draw random uniform samples of SNR in that range.

Example of stars

Results on the hybrid-MCCD flavour for two values of l1_rate.

To clarify, tr=train dataset, te=test dataset, pix= comparing the pixels value of the polychromatic PSFs, OPD= comparing the wavefront value.

Common parameters:

d_max = 2 # parametric part
d_max_nonparam = 3 # polynomial-constraint features
graph_features = 10 # Graph-constraint features

Both of them share the training cycle. Cycle_1:

l_rate_param=1e-2
l_rate_non_param=1.0,
n_epochs_param=20
n_epochs_non_param=140

Cycle_2:

l_rate_param=1e-2
l_rate_non_param=1.0,
n_epochs_param=30
n_epochs_non_param=140

Parametric model

n_epochs=60

model	tr pix RMSE	te pix RMSE	tr OPD RMSE	te OPD RMSE
param45_cycle	2.3422e-04	2.2259e-04	3.8499e-02	3.5063e-02

Semi-parametric polynomial `d=3`

model	tr pix RMSE	te pix RMSE	tr OPD RMSE	te OPD RMSE
semi15_poly_cycle1	3.4425e-05	3.6799e-05	7.2906e-02	7.5333e-02
semi15_poly_cycle2	2.2044e-05	2.3365e-05	7.3980e-02	7.6225e-02

Semi-parametric polynomial `d=5`

model	tr pix RMSE	te pix RMSE	tr OPD RMSE	te OPD RMSE
semi15_poly_cycle1	6.7209e-05	7.2750e-05	1.0547e-01	1.0438e-01
semi15_poly_cycle2	4.9008e-05	5.6119e-05	1.2015e-01	1.1854e-01

Semi-parametric MCCD with`l1_rate=0.0`

model	tr pix RMSE	te pix RMSE	tr OPD RMSE	te OPD RMSE
semi15_MCCD_cycle1	7.0474e-05	7.0349e-05	1.1562e-01	1.1411e-01
semi15_MCCD_cycle2	4.7397e-05	4.8463e-05	1.3517e-01	1.3273e-01

Semi-parametric MCCD with `l1_rate=1e-6`

model	tr pix RMSE	te pix RMSE	tr OPD RMSE	te OPD RMSE
semi15_MCCD_cycle1	5.1722e-05	5.5189e-05	1.2679e-01	1.2603e-01
semi15_MCCD_cycle2	3.3969e-05	3.5610e-05	1.3899e-01	1.3862e-01

Semi-parametric MCCD with `l1_rate` decreasing strategy `v1`

Sarting l1_rate=1e-6 Update_rule: Divide by 2 l1_rate each 10 epochs of the non-parametric update

model	tr pix RMSE	te pix RMSE	tr OPD RMSE	te OPD RMSE
semi15_MCCD_cycle1	3.8146e-05	4.0841e-05	9.9933e-02	1.0379e-01
semi15_MCCD_cycle2	2.3262e-05	2.4682e-05	9.8595e-02	1.0248e-01

Semi-parametric MCCD with `l1_rate` decreasing strategy `v2`

Sarting l1_rate=1e-8 Update_rule: Divide by 2 l1_rate each 10 epochs of the non-parametric update

model	tr pix RMSE	te pix RMSE	tr OPD RMSE	te OPD RMSE
semi15_MCCD_cycle1	3.7249e-05	3.9289e-05	8.1048e-02	8.3285e-02
semi15_MCCD_cycle2	2.2757e-05	2.3915e-05	8.1366e-02	8.3506e-02

tobias-liaudat commented 3 years ago

Matrices learned by the Semi-parametric MCCD with `l1_rate` decreasing strategy `v2`

Alpha weights

Polynomial variations and Graph variations, respectively.

S features

Polynomial variations.
Graph variations.

tobias-liaudat commented 3 years ago

Error maps for Semi-parametric MCCD with `l1_rate` decreasing strategy `v2`

Absolute pixel error

Relative pixel error

Absolute OPD error

tobias-liaudat commented 3 years ago

Following steps:

[x] Try changing the l1 loss to a l1.1 penalisation.
[x] Try the model using only the graph spatial variations for the non-parametric part.

tobias-liaudat commented 3 years ago

Lp with `p=1.1` loss

Using `l1_rate` with decay strategy

Starting l1_rate=1e-6. Update_rule: Divide by 2 l1_rate each 10 epochs of the non-parametric update. The resulting alpha matrix for the graph constraint is not sparse.

model	tr pix RMSE	te pix RMSE	tr OPD RMSE	te OPD RMSE
semi15_poly_cycle1	6.7778e-05	6.8381e-05	1.0746e-01	1.0718e-01
semi15_poly_cycle2	4.4738e-05	4.5292e-05	1.2387e-01	1.2351e-01

Using constante `l1_rate`

Starting l1_rate=1e-8. No update strategy.

model	tr pix RMSE	te pix RMSE	tr OPD RMSE	te OPD RMSE
semi15_poly_cycle1	3.6967e-05	3.9360e-05	1.0769e-01	1.0760e-01
semi15_poly_cycle2	2.3933e-05	2.5053e-05	1.0453e-01	1.0421e-01

tobias-liaudat commented 3 years ago

Graph-only semiparametric model

Using `l1_rate` with decay strategy `v2`

Starting l1_rate=1e-8. Update_rule: Divide by 2 l1_rate each 10 epochs of the non-parametric update.

model	tr pix RMSE	te pix RMSE	tr OPD RMSE	te OPD RMSE
semi15_poly_cycle1	2.9756e-04	2.7289e-04	1.2064e-01	1.2233e-01
semi15_poly_cycle2	1.8439e-04	1.8128e-04	1.5742e-01	1.6056e-01

Alpha matrix

A matrix

S features

Error maps

tobias-liaudat commented 3 years ago

Smaller alternations

Each param cycle is 30 iterations and the non-parametric cycle is 60 iterations.

Best polynomial semiparametric

model	tr pix RMSE	te pix RMSE	tr OPD RMSE	te OPD RMSE
semi15_cycle1	1.3445e-04	1.3151e-04	...	...
semi15_cycle2	9.4471e-05	9.4260e-05	...	...
semi15_cycle3	7.4924e-05	7.4810e-05	...	...
semi15_cycle4	6.2410e-05	6.3090e-05	...	...

Best mccd-type

model	tr pix RMSE	te pix RMSE	tr OPD RMSE	te OPD RMSE
semi15_cycle1	1.2268e-04	1.2068e-04	...	...
semi15_cycle2	7.4799e-05	7.6583e-05	...	...
semi15_cycle3	5.2799e-05	5.5683e-05	...	...
semi15_cycle4	4.0976e-05	4.3584e-05	...	...

tobias-liaudat commented 3 years ago

Closing this issue as it achieved its objective.

CosmoStat / wf-psf