appsembler / figures

Reporting and data retrieval app for Open edX
MIT License
44 stars 37 forks source link

Updates pipeline enrollment metrics queries to improve performance #226

Closed johnbaldwin closed 4 years ago

johnbaldwin commented 4 years ago

This commit should dramatically improve the query performance for the enrollment metrics pipeline

What was wrong?

Queries were very slow because of a 'LMIT 1' issues with MySQL. For a starting point, see here

https://stackoverflow.com/questions/15460133/mysql-dramatically-slower-query-execution-if-use-limit-1-instead-of-limit-5

In Django, we were doing a filter query that returns a single record or None. Examples:

StudentModule.objects.filter(**filter_args).latest('modified')
StudentModule.objects.filter(**filter_args).order_by('-modified).first()

Query functions such as latest, first, last and so on add a LIMIT 1 to the underlying SQL query, which has apparent negative performance on the query analyzer

To address this, we do two things

  1. For the specified course, we filter the StudentModule records
  2. For the specifid learner in the course, we filter

Also, LearnerCourseGradesMetrics queries are slow as the model needs indexing on fields including site, course, and learner. We address this twofold

  1. We will add indexing to the needed fields after we prune old records. This is so we're not indexing records we are just going to delete anyway
  2. We filter all LearnerCourseGradeMetrics records for the specified course

This commit performs #2 above to then filter from this queryset to find LearnerCourseGradeMetrics records for the specified learner in the course

Enrollment Metrics tests have been updated to reflect changes in the production code

codecov-commenter commented 4 years ago

Codecov Report

Merging #226 into master will decrease coverage by 0.04%. The diff coverage is 87.50%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #226      +/-   ##
==========================================
- Coverage   91.33%   91.28%   -0.05%     
==========================================
  Files          38       38              
  Lines        1950     1951       +1     
==========================================
  Hits         1781     1781              
- Misses        169      170       +1     
Impacted Files Coverage Δ
figures/pipeline/enrollment_metrics.py 98.36% <87.50%> (+0.02%) :arrow_up:
figures/sites.py 66.27% <0.00%> (-1.17%) :arrow_down:

Continue to review full report at Codecov.

Legend - Click here to learn more Δ = absolute <relative> (impact), ø = not affected, ? = missing data Powered by Codecov. Last update d96dd9b...d596ffa. Read the comment docs.