Closed dr-rodriguez closed 7 months ago
This is a follow-up to #205. And I'll put a proposed revised table: regime | UCD | Description |
---|---|---|
opt | em.opt | Optical part of the spectrum |
J | em.IR.J | Infrared between 1.0 and 1.5 micron |
H | em.IR.H | Infrared between 1.5 and 2 micron |
K | em.IR.K | Infrared between 2 and 3 micron |
NIR | em.IR.NIR | Near-Infrared, 1-5 microns |
MIR | em.IR.MIR | Medium-Infrared, 5-30 microns |
FIR | em.IR.FIR | Far-Infrared, 30-100 microns |
Thanks, I didn't realize we had a similar issue already in place. I think there are 2 things to consider here: one is the consolidation of our different values, but the other is the use of UCD here.
Unified Content Descriptor are meant for describing the values in a particular column. For the Photometry table, if the user pivots their results into multiple columns per band they can use the UCD column to identify that the new columns they've created are J-band emission magnitudes and similar. If a user does the same for the SpectralType table, the regime would appear to be saying that something like "M5" is actually J-band emission. That does not really make sense. Regime for tables like Spectra and SpectralTypes is, in my opinion, meant to be a human readable distinguisher to capture the rough wavelengths of interest. It is not meant to be a machine readable classifier describing the type of emission (eg, would we really have a J-band spectral type?). I think we're trying to be clever by recycling the UCD here, but what results is a more complex model that doesn't really add much.
Well said.
So, are you happy with my proposed table change?
No, what I would propose is that the regime enumerations be:
This is basically the same as what's in https://github.com/SIMPLE-AstroDB/SIMPLE-db/blob/main/documentation/SpectralTypes.md but removing nir and avoiding references to UCD.
The Photometry table can use UCD like those you list (and be un-bounded to grow with whatever new values we add), but Spectra, SpectralTypes, etc would have one of these 8 values.
I took a closer look as I was curious about something and realized that the database is storing the variable value of the enumeration, not the string. So, for example, in nir_UCD = 'em.IR.NIR'
the database stores and checks nir_UCD
; it does not actually recognize em.IR.NIR
as a valid value. This is probably since we are not using the full ORM capabilities of SQL Alchemy and/or since we are using the JSON files as the storage mechanism. So that this means is that enumerations should match both the variable and the string, otherwise we introduce confusion as to what is being stored. This also means that enumerations like beta/gamma
are invalid, as that is not a valid python variable name.
What a mess which I take complete responsibility for. I think this is relatively high priority and should be accompanied by a discussion of the best use of the UCD vocabulary. #293
Also, we decided to convert this a Reference table rather than using enumerations.
I still struggle with this. I think my biggest problem is that I'm uncomfortable making a new taxonomy when one already exists.
If it helps, you can consider doing this in two steps: first rework this as a reference table rather than enumerations. Then once that's done examine how many values we are storing and whether we need to consolidate them.
That's a good point. Just make the new table and linked fields and worry about taxonomy later. Love it.
Also need to update the documentation: https://github.com/SIMPLE-AstroDB/SIMPLE-db/blob/main/documentation/Spectra.md
We have a mismatch of regime values across the database and in the enumeration description, we have things like optical and optical_UCD. I don't think UCD are really adding anything to this and are just causing inconsistencies so I propose we remove all UCD regime values and turn them to their simpler counterparts; so instead of using things like
em.IR.NIR
we just usenir
Here are the values (post the Best2020 SpectralType PR) we have: