Closed tobias-liaudat closed 3 years ago
To do:
Now the new prediction function is completely differentiable (not needed right now but could be useful for a side-project, if we want to incorporate knowledge on the galaxy images). It is based on a two-step interpolation:
Right now the L1 norm is only applied to the alpha_graph
matrix as the alpha_poly
matrix is already naturally sparse due to the identity initialisation and the way the poly_dic
is built.
What is still needed to be done:
l1_rate
parameter for the L1 regularisation loss.Now that the job is done and working. I need to do some sanity checks to make sure that everything is working fine :)
Results on the hybrid-MCCD flavour for two values of l1_rate
. The colab notebook for the test l1_rate=0.0
with can be found in this commit.
To clarify, tr=train dataset, te=test dataset, pix= comparing the pixels value of the polychromatic PSFs, OPD= comparing the wavefront value.
Common parameters:
Both of them share the training cycle. Cycle_1:
Cycle_2:
l1_rate=0.0
model | te pix RMSE | te OPD RMSE |
---|---|---|
semi15_MCCD_cycle1 | 4.9919e-05 | 8.4276e-02 |
semi15_MCCD_cycle2 | 2.6158e-05 | 8.7374e-02 |
l1_rate=1e-6
model | te pix RMSE | te OPD RMSE |
---|---|---|
semi15_MCCD_cycle1 | 7.3274e-05 | 9.4550e-02 |
semi15_MCCD_cycle2 | 4.9498e-05 | 1.1070e-01 |
model | te pix RMSE | te OPD RMSE |
---|---|---|
semi15_poly_cycle1 | 3.4385e-05 | 1.1082e-01 |
semi15_poly_cycle2 | 1.7849e-05 | 1.0976e-01 |
model | te pix RMSE | te OPD RMSE |
---|---|---|
param45_cycle1 | 1.7352e-04 | ... |
param45_cycle2 | 1.9846e-05 | ... |
Using stars with different SNR values. The SNR range is between 10 and 70, we draw random uniform samples of SNR in that range.
Results on the hybrid-MCCD flavour for two values of l1_rate
.
To clarify, tr=train dataset, te=test dataset, pix= comparing the pixels value of the polychromatic PSFs, OPD= comparing the wavefront value.
Common parameters:
Both of them share the training cycle. Cycle_1:
Cycle_2:
n_epochs=60
model | tr pix RMSE | te pix RMSE | tr OPD RMSE | te OPD RMSE |
---|---|---|---|---|
param45_cycle | 2.3422e-04 | 2.2259e-04 | 3.8499e-02 | 3.5063e-02 |
d=3
model | tr pix RMSE | te pix RMSE | tr OPD RMSE | te OPD RMSE |
---|---|---|---|---|
semi15_poly_cycle1 | 3.4425e-05 | 3.6799e-05 | 7.2906e-02 | 7.5333e-02 |
semi15_poly_cycle2 | 2.2044e-05 | 2.3365e-05 | 7.3980e-02 | 7.6225e-02 |
d=5
model | tr pix RMSE | te pix RMSE | tr OPD RMSE | te OPD RMSE |
---|---|---|---|---|
semi15_poly_cycle1 | 6.7209e-05 | 7.2750e-05 | 1.0547e-01 | 1.0438e-01 |
semi15_poly_cycle2 | 4.9008e-05 | 5.6119e-05 | 1.2015e-01 | 1.1854e-01 |
l1_rate=0.0
model | tr pix RMSE | te pix RMSE | tr OPD RMSE | te OPD RMSE |
---|---|---|---|---|
semi15_MCCD_cycle1 | 7.0474e-05 | 7.0349e-05 | 1.1562e-01 | 1.1411e-01 |
semi15_MCCD_cycle2 | 4.7397e-05 | 4.8463e-05 | 1.3517e-01 | 1.3273e-01 |
l1_rate=1e-6
model | tr pix RMSE | te pix RMSE | tr OPD RMSE | te OPD RMSE |
---|---|---|---|---|
semi15_MCCD_cycle1 | 5.1722e-05 | 5.5189e-05 | 1.2679e-01 | 1.2603e-01 |
semi15_MCCD_cycle2 | 3.3969e-05 | 3.5610e-05 | 1.3899e-01 | 1.3862e-01 |
l1_rate
decreasing strategy v1
Sarting l1_rate=1e-6
Update_rule: Divide by 2 l1_rate
each 10 epochs of the non-parametric update
model | tr pix RMSE | te pix RMSE | tr OPD RMSE | te OPD RMSE |
---|---|---|---|---|
semi15_MCCD_cycle1 | 3.8146e-05 | 4.0841e-05 | 9.9933e-02 | 1.0379e-01 |
semi15_MCCD_cycle2 | 2.3262e-05 | 2.4682e-05 | 9.8595e-02 | 1.0248e-01 |
l1_rate
decreasing strategy v2
Sarting l1_rate=1e-8
Update_rule: Divide by 2 l1_rate
each 10 epochs of the non-parametric update
model | tr pix RMSE | te pix RMSE | tr OPD RMSE | te OPD RMSE |
---|---|---|---|---|
semi15_MCCD_cycle1 | 3.7249e-05 | 3.9289e-05 | 8.1048e-02 | 8.3285e-02 |
semi15_MCCD_cycle2 | 2.2757e-05 | 2.3915e-05 | 8.1366e-02 | 8.3506e-02 |
l1_rate
decreasing strategy v2
Polynomial variations.
Graph variations.
l1_rate
decreasing strategy v2
Following steps:
p=1.1
lossl1_rate
with decay strategyStarting l1_rate=1e-6
.
Update_rule: Divide by 2 l1_rate
each 10 epochs of the non-parametric update.
The resulting alpha matrix for the graph constraint is not sparse.
model | tr pix RMSE | te pix RMSE | tr OPD RMSE | te OPD RMSE |
---|---|---|---|---|
semi15_poly_cycle1 | 6.7778e-05 | 6.8381e-05 | 1.0746e-01 | 1.0718e-01 |
semi15_poly_cycle2 | 4.4738e-05 | 4.5292e-05 | 1.2387e-01 | 1.2351e-01 |
l1_rate
Starting l1_rate=1e-8
.
No update strategy.
model | tr pix RMSE | te pix RMSE | tr OPD RMSE | te OPD RMSE |
---|---|---|---|---|
semi15_poly_cycle1 | 3.6967e-05 | 3.9360e-05 | 1.0769e-01 | 1.0760e-01 |
semi15_poly_cycle2 | 2.3933e-05 | 2.5053e-05 | 1.0453e-01 | 1.0421e-01 |
l1_rate
with decay strategy v2
Starting l1_rate=1e-8
.
Update_rule: Divide by 2 l1_rate
each 10 epochs of the non-parametric update.
model | tr pix RMSE | te pix RMSE | tr OPD RMSE | te OPD RMSE |
---|---|---|---|---|
semi15_poly_cycle1 | 2.9756e-04 | 2.7289e-04 | 1.2064e-01 | 1.2233e-01 |
semi15_poly_cycle2 | 1.8439e-04 | 1.8128e-04 | 1.5742e-01 | 1.6056e-01 |
Each param cycle is 30
iterations and the non-parametric cycle is 60
iterations.
model | tr pix RMSE | te pix RMSE | tr OPD RMSE | te OPD RMSE |
---|---|---|---|---|
semi15_cycle1 | 1.3445e-04 | 1.3151e-04 | ... | ... |
semi15_cycle2 | 9.4471e-05 | 9.4260e-05 | ... | ... |
semi15_cycle3 | 7.4924e-05 | 7.4810e-05 | ... | ... |
semi15_cycle4 | 6.2410e-05 | 6.3090e-05 | ... | ... |
model | tr pix RMSE | te pix RMSE | tr OPD RMSE | te OPD RMSE |
---|---|---|---|---|
semi15_cycle1 | 1.2268e-04 | 1.2068e-04 | ... | ... |
semi15_cycle2 | 7.4799e-05 | 7.6583e-05 | ... | ... |
semi15_cycle3 | 5.2799e-05 | 5.5683e-05 | ... | ... |
semi15_cycle4 | 4.0976e-05 | 4.3584e-05 | ... | ... |
Closing this issue as it achieved its objective.
I should go to a full matrix factorisation of the style NP = S A instead of NP=S Pi (where Pi are not learned but calculated from the position as polynomials.
For this I need to handle properly the indexes in the training as each observation corresponds to a column in A.