LCOGT / mop

Microlensing Observation Portal
GNU General Public License v3.0
0 stars 7 forks source link

Evaluate storing timeseries data as arrays #135

Closed rachel3834 closed 7 months ago

rachel3834 commented 7 months ago

Currently we stored timeseries data as ReducedDatums, which use JSON dictionaries. Issue #133 proved that storing the data as arrays is more efficient for retrieval. Postgres does support an array datatype which might be more efficient, though this would require adding a custom table to the TOM's default DataProducts models. Investigate these options.

rachel3834 commented 7 months ago

For reference, this is an interesting discussion of this topic from StackOverflow.

rachel3834 commented 7 months ago

From the TOM Toolkit's point of view, adding ArrayFields is problematic because we can't guarantee that all users will go for a Postgres DB. However, MOP does use a Postgres DB.

rachel3834 commented 7 months ago

ArrayFields can be specified without a fixed size but they do have to be rectangular, i.e. 2D. Which would be fine for typical timeseries data.

rachel3834 commented 7 months ago

One implementation option for MOP would be to add an app to MOP with it's own DB table(s) for timeseries photometry held as ArrayFields. The advantage to this would be to remove the conversion to and from a list of lists, necessary to store and retrieve data from a JSON object.

Migrating MOP's current table of ReducedDatums to the photometry table would then involve

rachel3834 commented 7 months ago

The main alternative is to change MOP's current method of JSON object storage from separate ReducedDatums to ReducedDatums with a timeseries array. This would involve all of the same steps as above, but would continue to require us to convert the datatype constantly.

rachel3834 commented 7 months ago

After some benchmarking, results indicated that querying the reduceddatums is not the task that takes the most time, and instead its the target extra_fields that are the biggest time sink. The time required to sort through the reduceddatums isn't a limiting factor, so given the invasive nature of the proposed changes for minimal gain, I am halting this here for now to focus on the extra_fields.