COG-UK / dipi-group

Data integrity and pipeline integration working group
4 stars 1 forks source link

PHE1 MDV performance #199

Closed SamStudio8 closed 2 years ago

SamStudio8 commented 2 years ago

Just like everything else in our systems; the v3 API endpoint for sharing linkage with PHE/UKHSA has steadily grown slower.

The v3 API was written to leverage the Django Rest Framework and deployed an ingenious method to dynamically select fields for serialisation; which was hopefully going to herald a new era of data management. Unfortunately, as previously lamented, the way Majora links artifacts and processes together leads to poor performance for serialising large numbers of objects.

It would be nice if it wasn't slow.

SamStudio8 commented 2 years ago

Commit https://github.com/CLIMB-COVID/majora2/commit/61255f246c2a95e59b040557d9068c68b041ff9b hard codes a PHE1-FAST MDV. The definition is strictly followed and a modest test suite tests the basics of the function that replaces the magic DRF dynamic view.

Merely counting the size of the two query sets proves promising as they are at least the same size!

>>> mdv = models.MajoraDataview.objects.get(code_name="PHE1")
>>> queryset = apps.get_model("majora2", mdv.entry_point).objects.all()                                                  
>>> queryset.filter( mdv.get_filters() ).count()                                                                         
2114223
>>> len(mdv_tasks.subtask_get_mdv_v3_phe1_faster())
2114223

I'll liaise with FS to test this out this week. The performance delta is going to blow their socks off.

This isn't so troublesome for the DA1 view as it has not grown at the same rate (and linked samples are additionally removed from DA1, so it does not grow constantly). However, this approach would work to speed up the DA1 view in future if required.

SamStudio8 commented 2 years ago

PHE1-FAST appears to complete in around 60 seconds as opposed to 1h+, I will never design something that falls into the trap of that pesky n+1 query problem again.