This PR changes the interface for generating PSD datasets. Most importantly, it adds the option to generate smooth/synthetic PSD datasets based on the latent variable model.
Key changes regarding the PSD generation interface
Each PSD is saved in an .hdf5 file with keys
asds
gps_times
parameters (optionally)
Each of these items are again dicts with keys for each detector and the corresponding data. Maybe this should be changed that the detectors are keys on the first level, but I'm not sure
The dataset generation can be distributed across multiple condor jobs. If condor settings are passed in the settings file, a condor_dag is created.
First, all time segments used for estimating PSDs are generated based on T_PSD, T_gap and num_psds_max and they are saved in a file, see estimation.py:get_time_segments
All segments are then divided equally across all jobs (per detector) and each job i) downloads, estimates the PSDs corresponding to its segments and saves them in a .hdf5 file, which is referenced by the start time of the strain used for its estimation (estimate.py:download_and_estimate_psds). ii) Optionally, each PSD is also parameterized (parameterization:parameterize_single_psd), and the parameters are saved under the key parameters
When all jobs are done, utils.py:merge_datasets is called, which loads all the individual .hdf5 files and merges them into a single file. Optionally, we can save the smooth psds reconstructed from the latent parameters.
As a final step, we can resample the dataset created in the previous step (sampling.py) by fitting KDEs to the parameters and sampling from them.
Potential TODOs:
I left a few notes in the code, which should be self-explanatory
Most importantly, we could re-consider the structure of the .hdf5 files. It would simplify things at some places, if we had the detectors as keys on the first layer. Then each of the dicts for each detector could basically be treated identically. Currently, we need to keep track of the detector all the time. Changing this, however, would also require changing how asds are sampled in ASDDataset.
This PR changes the interface for generating PSD datasets. Most importantly, it adds the option to generate smooth/synthetic PSD datasets based on the latent variable model.
Key changes regarding the PSD generation interface
.hdf5
file with keysEach of these items are again dicts with keys for each detector and the corresponding data. Maybe this should be changed that the detectors are keys on the first level, but I'm not sure
condor_dag
is created.T_PSD
,T_gap
andnum_psds_max
and they are saved in a file, seeestimation.py:get_time_segments
.hdf5
file, which is referenced by the start time of the strain used for its estimation (estimate.py:download_and_estimate_psds
). ii) Optionally, each PSD is also parameterized (parameterization:parameterize_single_psd
), and the parameters are saved under the keyparameters
utils.py:merge_datasets
is called, which loads all the individual.hdf5
files and merges them into a single file. Optionally, we can save the smooth psds reconstructed from the latent parameters.sampling.py
) by fitting KDEs to the parameters and sampling from them.Potential TODOs:
.hdf5
files. It would simplify things at some places, if we had the detectors as keys on the first layer. Then each of the dicts for each detector could basically be treated identically. Currently, we need to keep track of the detector all the time. Changing this, however, would also require changing how asds are sampled inASDDataset
.