Open maja-jablonska opened 3 months ago
I'm try to get the VIPERS DR2 data. These are 91,507 galaxies that are complete at some mag limit
I wonder about the data format so all spectral datasets can be treated homogeneously. What do you think about designing a homogenous schema? How should we deal with inferred properties that are available in some surveys (e.g. inferred mass, abundances etc.) Maybe we should have a metadata field.
Opened #hackathon-spectra
Refer to issue #17 for schema and keep a similiar format
Still [WIP], but I have added data preprocessing for GALAH. I will continue with a HuggingFace datasets-compatible class tomorrow, and add some grouping for larger data probably. Developing in https://github.com/AstroPile/AstroPile_prototype/pull/24
I will try to add the Gaia BP/RP spectra (and some other Gaia info).
@al-jshen , @henrysky , and all interested, do you think we should include the inferred values (log g etc etc) in the same datasets? If there are not a lot of values, then no problem, but e.g. in GALAH there are inferred abundances and there might be a lot of columns. Maybe a separate dataset with object_id and corresponding abundances would be more accessible - but then again, we'd have to join datasets, which is always an overhead.
I do think we should at least include basic stellar parameters like teff, logg, [M/H] and [Alpha/M] which should be available to most spectroscopic Galactic surveys. But even for teff and logg, there are systematics between surveys...
that's true. 😞 but is there anything we can do about it except for noting it down? :/
Also, I think we should include the timestamp! in case anything had more than one spectra.
In my PR for Gaia the way I've done this is that in the format I have multiple keys that are returned. There is spectrum
with all the info about the spectrum, and then params
or whatever else with all the stellar parameters. Looks something like this:
GALAH: i need to add resolution information, otherwise ready!
Thursday discussion:
Include spectral datasets
Contacts: @maja-jablonska Participants: @maja-jablonska @henrysky @pmelchior @al-jshen
Goals and deliverable
Resources needed
Enthusiasm Some experience with a spectral dataset of choice
Detailed description