LCOGT / mop

Microlensing Observation Portal
GNU General Public License v3.0
0 stars 7 forks source link

Optimizing the storage of lightcurve data #132

Closed rachel3834 closed 5 months ago

rachel3834 commented 5 months ago

By default, when a TOM ingests a lightcurve, it stores a DataProduct and creates individual ReducedDatums for all entries in that lightcurve, meaning that a ReducedDatum stores a single datapoint in the format (e.g.) {"error": 0.015744359583535893, "filter": "G", "magnitude": 17.37}.

The end result of this is that the ReducedDatum table in a TOM can get quite big. For example, even my test version of MOP on my local machine, which holds only a small subset of our full dataset, has 1,196,862 entries. The table is currently indexed by primary key only.

This is sub-optimal given the way these data are typically retrieved. Normally we load all of the timeseries photometry for a single target in a few datasets at once, and restructure the output into arrays, all of which take separate DB operations.

Two possible alternative arrangements:

rachel3834 commented 5 months ago

Closing this as I am converting it to a milestone, since it will involve multiple issues.