Open kelle opened 7 months ago
Here is an example from the documentation (see https://astrodbkit2.readthedocs.io/en/latest/#using-the-sqlalchemy-orm):
from astrodbkit2.astrodb import Database
from schema import Sources
db = Database(connection_string)
# Create a new object and insert to database
s = Sources(source="V4046 Sgr", ra=273.54, dec=-32.79)
with db.session as session:
session.add(s)
session.commit()
This can work for any table, not just Sources and we can create list of these objects as well and add them all in one go.
If I wanted to add more values, would I do:
s.comment = "string comment"
I'm actually not sure- I expect so, but worth trying it out.
Confirmed that things like s.comments = "this is a comment"
work, so one can build the object iteratively.
I've been thinking about this some more and would like to propose we create a function in astrodb_utils of the type:
def ingest_object(db, DBTable, data_dict):
obj = DBTable(**data_dict)
with db.session as session:
session.add(obj)
session.commit()
In this function, db
is the AstrodbKit2 Database object, DBTable
is a SQLAlchemy ORM table (eg, Sources in from schema import Sources
), and data_dict
is a dictionary that contains the information to store with keys that match the column names.
There are a lot of benefits to this approach:
ingest_object(db, Sources, data_dict)
, if you want to ingest new photometry: ingest_object(db, Photometry, data_dict)
, new spectra?, new gravities? it all works the same way.data_dict
and good documentation so the users know which ones they can use and how an ingest pipeline can look like. In fact, I would argue against creating "blackbox" functions like ingest_sources
and instead guide users on how they would build a script that calls this ingest_object
function.I strongly recommend we consider this as it will make creating and maintaining the Toolkit far easier. There's still a place for helper functions, but this does mean that the Toolkit moves to being more guidelines and tutorials as opposed to lots of specific code.
Thoughts @kelle @arjunsavel @Will-Cooper?
I'll need to try it out!
One of the most important things is if the error messages it throws are useful. I would say that's one of the most important aspects of the ingest functions: they do a bunch of checks and provide useful feedback.
That is a fair point and I think that's where some of the schema validation functions can come in handy.
That said, one can also have functions that check certain aspects of data_dict
to help with this prior to the actual ingest, in a similar way that our older functions do.
I do like that approach, especially if decorated with something like np.vectorize
so that we can write the logic for a single object but can still pass an entire list of objects.
Would we then just write a long list like ra = data_dict.get('ra', None)
?
Confirmed that things like
s.comments = "this is a comment"
work, so one can build the object iteratively.
Could handle things like with property setters, with nice error messages returned, e.g. I have code like:
@staticmethod
def __assertquantity__(value: Optional[u.Quantity], optional: bool):
if optional and value is not None and not isinstance(value, u.Quantity):
raise AttributeError(f'Value {value} must be numeric or a quantity (or None)')
elif not optional and not (isinstance(value, u.CompositeUnit) or isinstance(value, u.Unit) or
isinstance(value, u.IrreducibleUnit)):
print(value, type(value))
raise AttributeError(f'Value {value} must be a unit')
return True
@property
def wunit(self) -> u.Unit:
return self._wunit
@wunit.setter
def wunit(self, value):
if not self.__assertquantity__(value, False):
raise AttributeError('wunit must be an astropy unit')
self._wunit = value
No need for fetching the individual columns, that get's unpacked as part of **data_dict
. If a required column is not provided, SQLAlchemy would error out.
And as for setters/getters- that's also already built into the ORM, see for example the validation we have in https://github.com/astrodbtoolkit/astrodb-template-db/blob/main/schema/schema_template.py#L179
We can expand that as needed, and while it's called validation it can also serve to do transforms (eg, to extra the value of a Quantity
object in a particular unit).
As for passing a list, we could vectorize it, but SQLAlchemy already has logic to handle inserting lists. For example, we could rewrite the main logic as:
data_list = []
for row in data_dict_list:
datum = DBTable(**row)
data_list.append(datum)
with db.session as session:
session.add_all(data_list)
session.commit()
One of the interesting things about this approach is that if you were doing this manually you can have data_list contain a list of a mix of objects- some Sources, some Photometry, some Spectra, and still use session.add_all
to ingest them all in one call. I think writing it up this way is beyond the scope of what we want to do, but we can provide guidelines if people want to approach it that way.
Takes more advantage of the schema and is more ...direct. @dr-rodriguez to provide an example.