SIMPLE-AstroDB / SIMPLE-db

BSD 3-Clause "New" or "Revised" License
11 stars 22 forks source link

Clean up regime values in database and make reference table #271

Closed dr-rodriguez closed 7 months ago

dr-rodriguez commented 2 years ago

We have a mismatch of regime values across the database and in the enumeration description, we have things like optical and optical_UCD. I don't think UCD are really adding anything to this and are just causing inconsistencies so I propose we remove all UCD regime values and turn them to their simpler counterparts; so instead of using things like em.IR.NIR we just use nir

Here are the values (post the Best2020 SpectralType PR) we have:

t = db.query(db.Spectra.c.regime, func.count(db.Spectra.c.regime).label('counts')). \
    group_by(db.Spectra.c.regime). \
    astropy()
print('Spectra')
print(t)

t = db.query(db.SpectralTypes.c.regime, func.count(db.SpectralTypes.c.regime).label('counts')). \
    group_by(db.SpectralTypes.c.regime). \
    astropy()
print('SpectralTypes')
print(t)

Spectra
  regime  counts
--------- ------
em.IR.NIR    118
   em.opt     21
      mir     91
      nir    456
  optical    718
SpectralTypes
 regime  counts
-------- ------
infrared     22
     nir     41
 nir_UCD   2081
 optical   1494
 unknown     10
kelle commented 2 years ago
This is a follow-up to #205. And I'll put a proposed revised table: regime UCD Description
opt em.opt Optical part of the spectrum
J em.IR.J Infrared between 1.0 and 1.5 micron
H em.IR.H Infrared between 1.5 and 2 micron
K em.IR.K Infrared between 2 and 3 micron
NIR em.IR.NIR Near-Infrared, 1-5 microns
MIR em.IR.MIR Medium-Infrared, 5-30 microns
FIR em.IR.FIR Far-Infrared, 30-100 microns
dr-rodriguez commented 2 years ago

Thanks, I didn't realize we had a similar issue already in place. I think there are 2 things to consider here: one is the consolidation of our different values, but the other is the use of UCD here.

Unified Content Descriptor are meant for describing the values in a particular column. For the Photometry table, if the user pivots their results into multiple columns per band they can use the UCD column to identify that the new columns they've created are J-band emission magnitudes and similar. If a user does the same for the SpectralType table, the regime would appear to be saying that something like "M5" is actually J-band emission. That does not really make sense. Regime for tables like Spectra and SpectralTypes is, in my opinion, meant to be a human readable distinguisher to capture the rough wavelengths of interest. It is not meant to be a machine readable classifier describing the type of emission (eg, would we really have a J-band spectral type?). I think we're trying to be clever by recycling the UCD here, but what results is a more complex model that doesn't really add much.

kelle commented 2 years ago

Well said.

So, are you happy with my proposed table change?

dr-rodriguez commented 2 years ago

No, what I would propose is that the regime enumerations be:

This is basically the same as what's in https://github.com/SIMPLE-AstroDB/SIMPLE-db/blob/main/documentation/SpectralTypes.md but removing nir and avoiding references to UCD.

The Photometry table can use UCD like those you list (and be un-bounded to grow with whatever new values we add), but Spectra, SpectralTypes, etc would have one of these 8 values.

dr-rodriguez commented 2 years ago

I took a closer look as I was curious about something and realized that the database is storing the variable value of the enumeration, not the string. So, for example, in nir_UCD = 'em.IR.NIR' the database stores and checks nir_UCD; it does not actually recognize em.IR.NIR as a valid value. This is probably since we are not using the full ORM capabilities of SQL Alchemy and/or since we are using the JSON files as the storage mechanism. So that this means is that enumerations should match both the variable and the string, otherwise we introduce confusion as to what is being stored. This also means that enumerations like beta/gamma are invalid, as that is not a valid python variable name.

kelle commented 2 years ago

What a mess which I take complete responsibility for. I think this is relatively high priority and should be accompanied by a discussion of the best use of the UCD vocabulary. #293

kelle commented 1 year ago

Also, we decided to convert this a Reference table rather than using enumerations.

kelle commented 1 year ago

I still struggle with this. I think my biggest problem is that I'm uncomfortable making a new taxonomy when one already exists.

dr-rodriguez commented 1 year ago

If it helps, you can consider doing this in two steps: first rework this as a reference table rather than enumerations. Then once that's done examine how many values we are storing and whether we need to consolidate them.

kelle commented 1 year ago

That's a good point. Just make the new table and linked fields and worry about taxonomy later. Love it.

kelle commented 8 months ago

Also need to update the documentation: https://github.com/SIMPLE-AstroDB/SIMPLE-db/blob/main/documentation/Spectra.md