askap-vast / vast-pipeline

This repository holds the code of the Radio Transient detection pipeline for the VAST project.
https://vast-survey.org/vast-pipeline/
MIT License
7 stars 3 forks source link

Remove MeasurementPair model #590

Closed marxide closed 2 years ago

marxide commented 2 years ago

Inserting the MeasurementPair objects into the database is a significant time sink during a pipeline run. During a recent run where an epoch was added to a full pilot survey run, the pairs upload alone took at least 3 hours.

The pairs stored in the database are only ever retrieved to make the pair graph on the source detail page. Since a single source will only have ~tens of measurements, computing the pairs and their metrics on demand for the graph is not an issue. All other uses of the measurement pairs either use the parquet file or the aggregated pair metrics that are stored in the Source model.

This PR removes the MeasurementPair model and replaces it with a simple dataclass. The dataclass is provided as a convenience container for working with measurement pairs within the pipeline -- they are no longer added to the database.

I've tested this by performing a run of a single field with 2 epochs using the dev branch. I then added another epoch to this run and ran it (add mode). I repeated this for using this branch and compared the measurement_pairs.parquet files to those generated by the current dev branch. The pairs in all cases agree.

marxide commented 2 years ago

I'm not sure if this is needed anymore as this was to remove the already uploaded ones from the measurement pairs.

You're right, it isn't needed. I thought it was required to add in the new pairs but I guess that's done elsewhere. Tested a run in add mode with the block removed and the pairs came out the same.

And I think it's fine but also the code in the restore command, but I think these are dealt with via the cascade.

I'm not sure what you're referring to here. Did I miss something in the restore command?

ajstewart commented 2 years ago

You're right, it isn't needed. I thought it was required to add in the new pairs but I guess that's done elsewhere. Tested a run in add mode with the block removed and the pairs came out the same.

Good, yeah I think that was all to do with the database uploading in that section. Otherwise the pairs are already calculated and ready to go.

I'm not sure what you're referring to here. Did I miss something in the restore command?

I meant the code in vast_pipeline/management/commands/restorepiperun.py but I think I wasn't thinking straight at the time - with the pairs now completely removed from the database there's nothing to be done here. It just replaces the pairs parquet with the old pairs parquet and that should be it, nothing to sort out in the database of course!