Optimizing the storage of lightcurve data

By default, when a TOM ingests a lightcurve, it stores a DataProduct and creates individual ReducedDatums for all entries in that lightcurve, meaning that a ReducedDatum stores a single datapoint in the format (e.g.) {"error": 0.015744359583535893, "filter": "G", "magnitude": 17.37}.

The end result of this is that the ReducedDatum table in a TOM can get quite big. For example, even my test version of MOP on my local machine, which holds only a small subset of our full dataset, has 1,196,862 entries. The table is currently indexed by primary key only.

This is sub-optimal given the way these data are typically retrieved. Normally we load all of the timeseries photometry for a single target in a few datasets at once, and restructure the output into arrays, all of which take separate DB operations.

Two possible alternative arrangements:

Store the timeseries photometry for a single dataset as a single ReducedDatum, similar to the format we use for model lightcurves, but including uncertainties: {"lc_model_time": [2457106.99663, 2457107.0, 2457108.0, ..., 2463366.4927099994, 2463387.4910899997], "lc_model_magnitude": [17.370999654649786, 17.3709996546483, 17.370999654208696, ..., 17.370999645270114, 17.370999654649786]}
Add a new table for timeseries photometry.

LCOGT / mop

Optimizing the storage of lightcurve data #132