CosmoStat / wf-psf

Data-driven wavefront-based PSF modelling framework.
MIT License
19 stars 9 forks source link

Add a train_test_split.py module for dataset creation #124

Open jeipollack opened 6 months ago

jeipollack commented 6 months ago

WaveDiff v.2.0.x is missing a module to create simulated datasets for training and validation. This issue is to discuss the development of this module. It may require refactoring of the data_config.yaml file.

jeipollack commented 6 months ago

@tobias-liaudat do training and test datasets require different values for the parameters (e.g. SEDs, Zernike coeffs, etc) as it is in data_config.yaml file? Note, I am asking specifically about the duplicates.

This would imply that different SEDs could be used, Zernikes, spatial variations, etc. If no, then I am wondering if a single set of parameters are specified to generate a single dataset from which it is split by some fraction defined in the config file. There could be other parameters like adding noise, etc.

Btw what does SR mean in the following?

# Gaussian noise for training stars
SNR_range = [10, 110]
# Parameters for the SR in the test dataset
SR_output_dim = 64
SR_output_Q = 1.0

Also what is the purpose of defining:

    stars: null
    noisy_stars: null
    positions: null
    zernike_coeffs: null
    polynomial_coeffs: null

? Maybe I added it as a reminder to myself, but I don't know what values would go there. Would it be the name(s) of the corresponding file?

tobias-liaudat commented 6 months ago

@jeipollack I don't recall how the parameters with null are handled in the new code. Is it a path to a .npy file?

SR is for super-resolved, those parameters are the ones to change the PSF simulator to generate super resolved stars.

In the original code, at the beginning, the generation of super-resolved (SR) stars was done on the fly, as the parameters it was fast in the GPU and with the parameters I was using. This allowed us to have a lightweight test/train .npy file. However, depending on the parameters, generating the SR stars may take a long time, and you may want to generate them only once and load the stars from the .npy.

In the usual usage of wavediff, we are not interested in having different parameters for the train/test stars. However, to carry out sensitivity testing and how does some errors in the input affect the PSF model after training we will need to have different parameters for train/test.