appsembler / figures

Reporting and data retrieval app for Open edX
MIT License
44 stars 37 forks source link

Option to use raw SQL to improve performance of site monthly metrics with large #s of StudentModules #388

Closed bryanlandia closed 3 years ago

bryanlandia commented 3 years ago

Change description

Tries to resolve performance issues running backfill_figures_monthly_metrics or even just running the site monthly metrics each month by schedule, in cases where there is a large number of students and StudentModules. I was not able to complete a month on a site with 95k students and ~7 million StudentModule records without pegging system memory and making the whole site unresponsive.

Using a raw SQL query provides performance benefits over using Python/Django... The main bottleneck with Django is the calculation of distinct() student_ids on the QuerySets. Raw SQL DISTINCT() function is significantly more performant.

This PR provide a use_raw parameter to pipelines.site_monthly_metrics.fill_month and "upstream" functions and management commands.

Includes tests, though the actual SQL statement has to be changed to accommodate the differences in sqllite.

Type of change

Checklists

Development

Security

Code review

bryanlandia commented 3 years ago

Pulled back to Draft — looks like some CI checks failed.