astrodbtoolkit / astrodb_utils

https://astrodb-utils.readthedocs.io/en/latest/
BSD 3-Clause "New" or "Revised" License
0 stars 4 forks source link

Revamp `ingest_photometry` function #50

Open kelle opened 5 months ago

kelle commented 5 months ago

Evan (@exu-112) is going to be ingesting some photometry next week (https://github.com/SIMPLE-AstroDB/SIMPLE-db/issues/515). do we want him to use this old-style function? or do we want to re-write it and let him test a new-style function? @dr-rodriguez ?

dr-rodriguez commented 5 months ago

I recommend using the new-style with the ORM. It'll be better in the long run and its a good time to test it out.

kelle commented 4 months ago

Well, we didn't change it here cause the function was too easy to write. Here it is:

def ingest_Photometry(
    source: str = None,
    band: str = None,
    magnitude: float = None,
    mag_error: float = None,
    telescope: str = None,
    comment: str = None,
    reference: str = None,
    raise_error: bool = True,
):
    table_data = {
        "source": source,
        "band": band,
        "magnitude": magnitude,
        "magnitude_error": mag_error,
        "telescope": telescope,
        "comments": comment,
        "reference": reference,
    }

    try:
        pho_obj = Photometry(**table_data)
        with db.session as session:
            session.add(pho_obj)
            session.commit()
        logger.info(f" Photometry added to database: {table_data}\n")
    except sqlalchemy.exc.IntegrityError as e:
        if "UNIQUE constraint failed:" in str(e):
            msg = f" Photometry with same reference already exists in database: {table_data}"
            if raise_error:
                raise AstroDBError(msg) from e
            else:
                msg2 = f"SKIPPING: {source}. Photometry with same reference already exists in database."
                logger.warning(msg2)
        else:
            msg = f"Could not add {table_data} to database. Error: {e}"
            logger.warning(msg)
            raise AstroDBError(msg) from e
kelle commented 4 months ago

@dr-rodriguez , Evan has identified an interesting challenge to using this "new" way of ingesting in the astrodb_utils package. In order to do this, we have to import the relevant table from the schema. Should we by default import the table from the template schema? And then maybe provide options if the table has a custom setup? Hmmm....we should discuss and think.

kelle commented 4 months ago

Well, the error handling does need to be improved. Because of the validation, Python returned this super useful error:

sqlalchemy.exc.IntegrityError: (sqlite3.IntegrityError) 
NOT NULL constraint failed: Photometry.regime

But the ingest_photometry function returned this:

astrodb_utils.utils.AstroDBError: 
The source may not exist in Sources table.
The band may not exist in the PhotometryFilters table.
The reference may not exist in the Publications table. Add it with add_publication function.
dr-rodriguez commented 4 months ago

See the related comment at https://github.com/astrodbtoolkit/astrodb_utils/issues/43#issuecomment-2058990448

Something like this works in general regardless of the table being populated:

def ingest_object(db, DBTable, data_dict):
    obj = DBTable(**data_dict)
    with db.session as session:
         session.add(obj)
         session.commit()

This bare-bones function takes the Table object which contains the proper validation, as well as the database and the dictionary to be loaded.

That argues that we don't even need separate ingest_photometry, ingest_sources, etc etc as long as your schema includes proper validation. I've mentioned it before, but the more specific we make our functions, the more convoluted the code is (since we need to tie it to the schema) and the less useful/generalized it is.

kelle commented 4 months ago

Here is some compromise pseuodcode:


from schema import Photometry
from schema import *

db = load_db(blah)

photometry_data = {stuff}

ingest_photometry(db, table=Photometry, data=photometry_data)
dr-rodriguez commented 4 months ago

Sounds good. To clarify, the module that contains the definition won't have from schema import Photometry or similar, it would just have the function definition. It's up to the user to do the schema imports before calling any of the ingest functions.