AstroPile / FlatironMeeting2024

AstroPile meet-up at the Flatiron Institute
https://astropile.github.io/FlatironMeeting2024/
MIT License
2 stars 3 forks source link

[Data] Add Time Domain Data to AstroPile #17

Open benboyd97 opened 5 months ago

benboyd97 commented 5 months ago

Add Time Domain Data to AstroPile

Add time domain photometric data (e.g. PLAsTiCC/Young Supernova Experiment) and spectroscopic data (e.g. Kaepora) to AstroPile.

Contacts: Participants: Ben Boyd, Erin Hayes, Tom Hehir, David Chemaly, Helen Qu

Goals and deliverable

[describe your goals for the week and the deliverables you are aiming for]

Resources needed

[describe the resources (software, skills, data, or just enthusiasm) needed for this project]

Detailed description

Photometry:

Spectra:

Questions to answer:

erinhay commented 5 months ago

YSE DR1 Zenodo link: https://zenodo.org/records/7317476 PLAsTiCC (unbinded) Zenodo link: https://zenodo.org/records/2539456

erinhay commented 5 months ago

Added functionality to download YSE DR1 data from Zenodo

https://github.com/AstroPile/AstroPile_prototype/tree/yse

helenqu commented 5 months ago

time series format (zero-padded to max length):

required metadata:

optional metadata (e.g. host galaxy info) can be added for different surveys/sources

helenqu commented 5 months ago

plasticc: https://github.com/AstroPile/AstroPile_prototype/pull/22

tom-hehir commented 5 months ago

Have converted YSE data to HDF5 format (may still want to add some more optional metadata columns and rethink formatting of quality mask which is currently 0x00 for all data).

helenqu commented 5 months ago

time series row per object: (n_bands, n_features, seq_len)

mb010 commented 5 months ago

@helenqu the Cambridge group (@tom-hehir @ado8 @benboyd97 @David-Chemaly) is planning on writting a timedomain.py file that has a huggingface dataset defined in the morning. I think we are struggling to find a mental bridge between the data prep (which your PR does for plasticc) and the HF datasets format. So we are going to write a file to that end to have a target to build the data for. Let us know where you get up to today so we don't duplicate any of your work. 😄

helenqu commented 5 months ago

@mb010 thanks for letting me know! i just read this and was preparing a dataset loading script. i'm not done yet but will update the PR so you guys can see (will mark where i stopped).

benboyd97 commented 5 months ago

I added the Foundation DR1 data_download.py to the yse branch. We'll create a pull request for the main branch once we've figured out some bugs with the hdf5 file storage.

ado8 commented 5 months ago

I added some basic python files for downloading data and building parent samples for CSP DR3 and CfA's SNe II. There's a lot of copy-pasting so there might be room for consolidating common functions.

benboyd97 commented 5 months ago

Added four more SNe Ia datasets from the Pantheon+ compilation, detailed further in issue #30. Work can be found in the panth_sne branch.

helenqu commented 5 months ago

finally have dataset loading working for plasticc!

image image

updated code in https://github.com/AstroPile/AstroPile_prototype/pull/22

tom-hehir commented 5 months ago

Have updated build_parent_sample and verified that it works on the following datasets: YSE, SNLS, Foundation, DES Y3

However script currently breaks on Swift and PS1 with both giving the same error message:

Traceback (most recent call last):
  File "C:\Users\tomhe\Documents\AstroPile_prototype\scripts\ps1_sne_ia\build_parent_sample.py", line 161, in <module>
    main(args)
  File "C:\Users\tomhe\Documents\AstroPile_prototype\scripts\ps1_sne_ia\build_parent_sample.py", line 70, in main
    metadata[key].append(metadata_[key])
                         ~~~~~~~~~^^^^^
KeyError: 'SNTYPE'

Will investigate this tomorrow -- expect it will be a quick fix...

tom-hehir commented 5 months ago

Update on storage of timeseries data:

Possible new convention for storage:

mb010 commented 5 months ago

Mini summary of our discussion:

TODOs: