Open benboyd97 opened 5 months ago
YSE DR1 Zenodo link: https://zenodo.org/records/7317476 PLAsTiCC (unbinded) Zenodo link: https://zenodo.org/records/2539456
Added functionality to download YSE DR1 data from Zenodo
time series format (zero-padded to max length):
time
: absolute time in MJD (float32)flux
flux_err
band
: integer band IDquality_mask
: @jeraud can maybe expand on these detailsrequired metadata:
object_id
: unique within a survey/data sourcera
dec
optional metadata (e.g. host galaxy info) can be added for different surveys/sources
Have converted YSE data to HDF5 format (may still want to add some more optional metadata columns and rethink formatting of quality mask which is currently 0x00 for all data).
time series row per object: (n_bands, n_features, seq_len)
n_bands
: number of bandsn_features
: likely 3 (timestamp in MJD, flux measurement, flux error)seq_len
: length of the sequence (padded to the longest sequence of the file)@helenqu the Cambridge group (@tom-hehir @ado8 @benboyd97 @David-Chemaly) is planning on writting a timedomain.py
file that has a huggingface dataset defined in the morning. I think we are struggling to find a mental bridge between the data prep (which your PR does for plasticc) and the HF datasets format. So we are going to write a file to that end to have a target to build the data for. Let us know where you get up to today so we don't duplicate any of your work. 😄
@mb010 thanks for letting me know! i just read this and was preparing a dataset loading script. i'm not done yet but will update the PR so you guys can see (will mark where i stopped).
I added the Foundation DR1 data_download.py to the yse branch. We'll create a pull request for the main branch once we've figured out some bugs with the hdf5 file storage.
I added some basic python files for downloading data and building parent samples for CSP DR3 and CfA's SNe II. There's a lot of copy-pasting so there might be room for consolidating common functions.
Added four more SNe Ia datasets from the Pantheon+ compilation, detailed further in issue #30. Work can be found in the panth_sne branch.
finally have dataset loading working for plasticc!
updated code in https://github.com/AstroPile/AstroPile_prototype/pull/22
Have updated build_parent_sample and verified that it works on the following datasets: YSE, SNLS, Foundation, DES Y3
However script currently breaks on Swift and PS1 with both giving the same error message:
Traceback (most recent call last):
File "C:\Users\tomhe\Documents\AstroPile_prototype\scripts\ps1_sne_ia\build_parent_sample.py", line 161, in <module>
main(args)
File "C:\Users\tomhe\Documents\AstroPile_prototype\scripts\ps1_sne_ia\build_parent_sample.py", line 70, in main
metadata[key].append(metadata_[key])
~~~~~~~~~^^^^^
KeyError: 'SNTYPE'
Will investigate this tomorrow -- expect it will be a quick fix...
Update on storage of timeseries data:
time
, flux
, flux_err
organised by band into a block of shape (num_examples, num_bands, 3, sequence_length)
mag
and mag_err
lightcurve
and lightcurve_additional
and storing the core timeseries variables in the former and additional timeseries variables in the latter. Another option would be to store them in a single block (but this is in conflict with the suggestion below...)Possible new convention for storage:
(num_examples, num_bands, 3, sequence_length)
, instead the user can specifically access e.g. the flux
timeseries by indexing the data with [flux]
.time
, flux
, flux_err
) can all be represented as np.float32
and put into a single array, other timeseries have heterogeneous data types which prohibits combining them into a single array.TODOs:
test.sh
script for each dataset
Add Time Domain Data to AstroPile
Add time domain photometric data (e.g. PLAsTiCC/Young Supernova Experiment) and spectroscopic data (e.g. Kaepora) to AstroPile.
Contacts: Participants: Ben Boyd, Erin Hayes, Tom Hehir, David Chemaly, Helen Qu
Goals and deliverable
[describe your goals for the week and the deliverables you are aiming for]
Resources needed
[describe the resources (software, skills, data, or just enthusiasm) needed for this project]
Detailed description
Photometry:
Spectra:
Questions to answer: