BDNYC / astrodbkit

A Python module for astronomy to create, manage, and collaborate using SQL relational databases.
MIT License
2 stars 7 forks source link

Option to reshape/pivot photometry table #17

Open dr-rodriguez opened 8 years ago

dr-rodriguez commented 8 years ago

There should be an option to reshape the photometry table so that the bands are columns. For example: txt = 'SELECT * FROM photometry WHERE source_id=700' t = db.query(txt, fmt='table') df = t.to_pandas() df_new = df.pivot(index='source_id', columns='band', values='magnitude')

This has df_new with each band as a column name and the band magnitudes as the column values.

dr-rodriguez commented 8 years ago

Here's an example for when you have multiple magnitudes for a particular band: txt = 'SELECT p.source_id, p.band, p.magnitude FROM photometry AS p JOIN sources AS s WHERE p.source_id=s.id' t = db.query(txt, fmt='table') df = t.to_pandas() df2 = df.convert_objects(convert_numeric=True) pd.pivot_table(df2, values='p.magnitude', index=['p.source_id'], columns=['p.band'], aggfunc=np.mean)

hover2pi commented 7 years ago

This sounds like an easily constructed 'view' to me... though what do you do if a source has multi-epoch photometry?

dr-rodriguez commented 7 years ago

Indeed, a view would be the easiest to implement and I would favor that at the moment, though it requires the database administrator to know how to do that instead of being a built-in method.

For my tests I took the average of any multiple entries, but I'm not sure how one would preserve multiple entries per source. I would guess that if you have multiple epoch as more of a timeseries then either these should not be pivoted or they should be grouped by date/time but only for a single source.