appsembler / figures

Reporting and data retrieval app for Open edX
MIT License
44 stars 37 forks source link

faster total_site_certificates_as_of_date #473

Closed OmarIthawi closed 1 year ago

OmarIthawi commented 1 year ago

avoid pulling thousands of coursedailymetric instances just for count

using queryset.first() to utilize SQL LIMIT 1 : https://github.com/django/django/blob/2a62cdcfec85938f40abb2e9e6a9ff497e02afe8/django/db/models/query.py#L651-L654

Debugging doc: https://appsembler.atlassian.net/wiki/spaces/RT/pages/2680783079/RED-3513+-+Debug+slow+response+times#Figures-slow-query

johnbaldwin commented 1 year ago

@OmarIthawi I'm reviewing this PR. I'm going to do some poking around. I just want to add a note of caution at this point.

Sometimes MySQL has degraded performance when using LIMIT. If you do a web search with "mysql slow limit" and "mysql slow limit 1", you should see some pages about this. My knowledge on this issue is still quite limited. The problem may "just" be with offsets, however, I'm not certain of that.

johnbaldwin commented 1 year ago

@OmarIthawi What do you think of skipping the initial check if there are records.

This should work. I ran it in the Django shell for dates that would return both records and return no records

val = CourseDailyMetrics.objects.filter(site=site, date_for__lte=date_for).aggregate(
    Sum('num_learners_completed'))['num_learners_completed__sum']
return val if val else 0

You probably also want to make sure that test coverage covers both cases

edit this might not work. I need to look over what's being collected so we're duplicating counts. If the 'num_learners_completed' is just for that given day, we're good. If it is a running total, then my suggestion is not going to work

edit Checked the data. It looks cumulative. Sigh. This shows I prematurely optimized. I should have retained the value just for the given day

OmarIthawi commented 1 year ago

Sometimes MySQL has degraded performance when using LIMIT. If you do a web search with "mysql slow limit" and "mysql slow limit 1", you should see some pages about this. My knowledge on this issue is still quite limited. The problem may "just" be with offsets, however, I'm not certain of that.

Thanks @johnbaldwin. Yes, I think I've had this and sadly it sometimes slows even without OFFSET. In this case we're optimizing python for sure since if qs uses to evaluate all 83k CourseDailyMetric instances, with LIMIT this is no longer the case. For SQL speed, we'll have to see in practice how this fix affects the Figures API performance.