Tries to resolve performance issues running backfill_figures_monthly_metrics or even just running the site monthly metrics each month by schedule, in cases where there is a large number of students and StudentModules. I was not able to complete a month on a site with 95k students and ~7 million StudentModule records without pegging system memory and making the whole site unresponsive.
Using a raw SQL query provides performance benefits over using Python/Django... The main bottleneck with Django is the calculation of distinct() student_ids on the QuerySets. Raw SQL DISTINCT() function is significantly more performant.
This PR provide a use_raw parameter to pipelines.site_monthly_metrics.fill_month and "upstream" functions and management commands.
Includes tests, though the actual SQL statement has to be changed to accommodate the differences in sqllite.
Type of change
[x] Bug fix (fixes an issue)
[ ] New feature (adds functionality)
Checklists
Development
[x] Lint rules pass locally
[x] Application changes have been tested thoroughly
[x] Automated tests covering modified code pass
Security
[x] Security impact of change has been considered
[x] Code follows company security practices and guidelines
Code review
[x] Pull request has a descriptive title and context useful to a reviewer. Screenshots or screencasts are attached as necessary
[x] "Ready for review" label attached and reviewers assigned
[ ] Changes have been reviewed by at least one other contributor
[ ] Pull request linked to task tracker where applicable
Change description
Tries to resolve performance issues running backfill_figures_monthly_metrics or even just running the site monthly metrics each month by schedule, in cases where there is a large number of students and StudentModules. I was not able to complete a month on a site with 95k students and ~7 million StudentModule records without pegging system memory and making the whole site unresponsive.
Using a raw SQL query provides performance benefits over using Python/Django... The main bottleneck with Django is the calculation of distinct() student_ids on the QuerySets. Raw SQL DISTINCT() function is significantly more performant.
This PR provide a use_raw parameter to pipelines.site_monthly_metrics.fill_month and "upstream" functions and management commands.
Includes tests, though the actual SQL statement has to be changed to accommodate the differences in sqllite.
Type of change
Checklists
Development
Security
Code review