Closed pbanaszkiewicz closed 9 years ago
On Sat, May 23, 2015 at 08:56:58AM -0700, Piotr Banaszkiewicz wrote:
all_badges
:
- a "count in a loop" solved by using annotations:
annotate(num_awarded=Count('award'))
… Now the biggest issue istask_set.instructors.count
, which we're using a lot. @wking, do you perhaps have any idea for a solution?
Annotate with a Count. See 1, although it sounds like you're alreay using this for the award counts.
@wking:
Annotate with a Count.
Is that possible with a filter on related object? Ie. Annotate(Count(role__name="instructor"))? I think I tried that and it didn't work.
On Sun, May 24, 2015 at 11:11:17AM -0700, Piotr Banaszkiewicz wrote:
Annotate with a Count.
Is that possible with a filter on related object? Ie. Annotate(Count(role__name="instructor"))? I think I tried that and it didn't work.
Ah, right. Conditional aggregation has only landed recently 1. I can't find docs for these new features, but the example at the end of ticket 11305 works for me:
from django.db.models import IntegerField, Count from django.db.models.expressions import Case, Value, When from workshops.models import Event
Event.objects.annotate( instructor_count=Count(Case(When(taskrolename='instructor', then=Value(1)), output_field=IntegerField())))[0].instructor_count
Okay, nice find. It does work, but the overhead is terrible. 6 queries in 171ms compared to 31 queries in 11ms.
This method, however, shines when we display all events on one page (no pagination): 6.5s load time versus 1.2s.
@wking what do you think? Should we switch? If we display all items on one page we could easily enable DataTables for them. Otherwise we'd have to use AJAX endpoints to load the data.
Do you think in the long run we'd be better with less queries?
Or maybe the visible overhead happens only because of SQLite and it's not present in other DBs?
I don't think we should worry about optimizing queries yet - it's fast enough for our purposes, will still be fast enough with twice the data, and since the views are likely to shift significantly in the next 4-6 weeks as we get real users, we could well be optimizing things we don't actually need.
You're right, but datatables are scheduled for v0.4, and they need to have a quick site. That's why I was thinking about optimizations.
Then let's push data tables back to 0.5.
On Sun, May 24, 2015 at 02:43:13PM -0700, Piotr Banaszkiewicz wrote:
Okay, nice find. It does work, but the overhead is terrible. 6 queries in 171ms compared to 31 queries in 11ms.
When we can reduce the number of queries, I think that's a good thing. Slow queries can usually be optimized around with additional indexes. EXPLAIN QUERY PLAN is useful for figuring out why a given query is slow 1. But any problems we can push into the database layer and solve there are problems that don't need us maintaining Python code.
I also think that, while it's useful to keep chipping away at these things where we see a clear path forward to avoid building up too much technical debt, we don't need to solve all of the performance problems in one shot. So if something works well enough and we don't have time to get to the bottom of how to make it better, we can punt until we do have that time (or the current approach is no longer working well enough).
Thank you both, @wking and @gvwilson, for feedback.
I think DataTables are a very important feature for other admins than Greg himself (as he's very familiar with the UI and probably knows who's organizing what workshops). I don't want to push them for later.
I have some code for optimizations in https://github.com/pbanaszkiewicz/amy/tree/optimization, we might need it in a near future, so I save it there.
Cheers! :)
I was tinkering with DataTables, and found out that some of our views use a terrific number of SQL queries and/or don't have good performance for other reasons.
We need to address these issues before implementing DataTables.
Bottlenecks I found:
index
:event.site
solved by usingselect_related('site')
: https://docs.djangoproject.com/en/1.8/ref/models/querysets/#select-relatedevent.task_set.instructors.count
unsolvedall_events
:event.tags.all|join
solved by usingprefetch_related('tags')
: https://docs.djangoproject.com/en/1.8/ref/models/querysets/#prefetch-relatedevent.site
solved by usingprefetch_related('site')
event.task_set.instructors.count
unsolvedall_tasks
:select_related('event', 'person', 'role')
solved the issue of querying related objectsall_badges
:annotate(num_awarded=Count('award'))
instructors
:debrief
:all_tasks
task.person.task_set.instructors.count
not solvedexport/instructors
:prefetch_related('person_set')
solvedevent_details
:all_tasks
event_edit
:all_tasks
There are a small number of bottlenecks in edit pages, and probably when #285 merges we'll have better DOM performance.
Now the biggest issue is
task_set.instructors.count
, which we're using a lot. @wking, do you perhaps have any idea for a solution?