AstroPile / FlatironMeeting2024

AstroPile meet-up at the Flatiron Institute
https://astropile.github.io/FlatironMeeting2024/
MIT License
2 stars 3 forks source link

[Data] Include spectral datasets #19

Open maja-jablonska opened 3 months ago

maja-jablonska commented 3 months ago

Include spectral datasets

  1. Decide upon a format of metadata connected to spectra (e.g., some surveys have parameters inferred - e.g., Teff, abundances)
  2. Do we treat spectral time series differently? They might not necessarily be available in a large volume currently, but we might want to add them.
  3. How do we want to treat Gaia BP/RP spectra? Do we preprocess them?

Contacts: @maja-jablonska Participants: @maja-jablonska @henrysky @pmelchior @al-jshen

Goals and deliverable

  1. A fixed format for inferred parameters tied to spectra
  2. Some spectral datasets added - APOGEE, GALAH, HARPS? Gaia?

Resources needed

Enthusiasm Some experience with a spectral dataset of choice

Detailed description

pmelchior commented 3 months ago

I'm try to get the VIPERS DR2 data. These are 91,507 galaxies that are complete at some mag limit

maja-jablonska commented 3 months ago

I wonder about the data format so all spectral datasets can be treated homogeneously. What do you think about designing a homogenous schema? How should we deal with inferred properties that are available in some surveys (e.g. inferred mass, abundances etc.) Maybe we should have a metadata field.

maja-jablonska commented 3 months ago

Opened #hackathon-spectra

maja-jablonska commented 3 months ago

Refer to issue #17 for schema and keep a similiar format

maja-jablonska commented 3 months ago

Still [WIP], but I have added data preprocessing for GALAH. I will continue with a HuggingFace datasets-compatible class tomorrow, and add some grouping for larger data probably. Developing in https://github.com/AstroPile/AstroPile_prototype/pull/24

al-jshen commented 3 months ago

I will try to add the Gaia BP/RP spectra (and some other Gaia info).

maja-jablonska commented 3 months ago

@al-jshen , @henrysky , and all interested, do you think we should include the inferred values (log g etc etc) in the same datasets? If there are not a lot of values, then no problem, but e.g. in GALAH there are inferred abundances and there might be a lot of columns. Maybe a separate dataset with object_id and corresponding abundances would be more accessible - but then again, we'd have to join datasets, which is always an overhead.

henrysky commented 3 months ago

I do think we should at least include basic stellar parameters like teff, logg, [M/H] and [Alpha/M] which should be available to most spectroscopic Galactic surveys. But even for teff and logg, there are systematics between surveys...

maja-jablonska commented 3 months ago

that's true. 😞 but is there anything we can do about it except for noting it down? :/

Also, I think we should include the timestamp! in case anything had more than one spectra.

al-jshen commented 3 months ago

In my PR for Gaia the way I've done this is that in the format I have multiple keys that are returned. There is spectrum with all the info about the spectrum, and then params or whatever else with all the stellar parameters. Looks something like this:

Screenshot 2024-03-27 at 3 57 33 PM
maja-jablonska commented 3 months ago

GALAH: i need to add resolution information, otherwise ready!

maja-jablonska commented 3 months ago

Thursday discussion: