SteveOv / ebop_maven

EBOP Model Automatic input Value Estimation Neural network
GNU General Public License v3.0
1 stars 0 forks source link

Investigate predictions on "swap" synthetic test dataset #80

Open SteveOv opened 3 months ago

SteveOv commented 3 months ago

Investigate the phenomenon shown below, when we test against a synthetic test dataset where instances are swapped if the original secondary eclipse is found to be deeper.

All instances Transiting instances
predictions-nonmc-vs-labels-synthetic-mist-tess-dataset-swap10 predictions-nonmc-vs-labels-synthetic-mist-tess-dataset-swap10-transiting
SteveOv commented 3 months ago

Added a swapped field to saved CSVs with 81a257d

SteveOv commented 3 months ago

Can confirm that this is definitely down to the instances that have been swapped.

Swapped instances Non-swapped instances
predictions-nonmc-vs-labels-synthetic-mist-tess-dataset-swap10-swapped predictions-nonmc-vs-labels-synthetic-mist-tess-dataset-swap10-not-swapped
SteveOv commented 3 months ago

By combining the swapped and transiting criteria we see that the majority are swapped and transiting.

Swapped & transiting instances swapped & non-transiting instances
pred-swt pred-swnt
SteveOv commented 3 months ago

Doesn't completely resolve the issue, but I've found that I've been handling the change in bP and bS when switching components incorrectly. These need to be recalculated as they must relate to the newly assigned star A

SteveOv commented 3 months ago

Tried training models on a datasets with swap enabled. Invariably, these improved the results with the synthetic set with swap enabled at the expense of these without it, however the net result was significantly worse predictions.

The following shows the predictions for k against both "swapped" and "non-swapped" synthetic-mist-tess-datasets with a model trained on s 100k dataset with swapped instances (without additional restrictions on k, J or qphot); synth test dataset with swap synth test dataset without swap
swapped non-swapped
For the "swap" model (trained on 100k train/val instances without swap): test dataset all instances transiting non-transiting
synth test dataset with swap (k<=10) 0.060 041 0.124 873 0.042 900
synth test dataset without swap 0.061 276 0.092 278 0.053 035
formal test dataset (effectively with swap) 0.064 847 0.107 058 0.0530122
For the control model (trained on 100k train/val instances without swap): test dataset all instances transiting non-transiting
synth test dataset with swap (k<=10) 0.071 529 0.179 780 0.042 909
synth test dataset without swap 0.040 515 0.063 774 0.034 333
formal test dataset (effectively with swap) 0.050 801 0.077 074 0.043 503
SteveOv commented 1 month ago

Returning to this with models trained with the mags feature centred on the midpoint between the eclipses and roll (agumentation) <= 512 bins.