carpentries / amy

A web-based workshop administration application built using Django.
https://amy.carpentries.org
MIT License
114 stars 72 forks source link

Views optimization #293

Closed pbanaszkiewicz closed 9 years ago

pbanaszkiewicz commented 9 years ago

I was tinkering with DataTables, and found out that some of our views use a terrific number of SQL queries and/or don't have good performance for other reasons.

We need to address these issues before implementing DataTables.

Bottlenecks I found:

There are a small number of bottlenecks in edit pages, and probably when #285 merges we'll have better DOM performance.

Now the biggest issue is task_set.instructors.count, which we're using a lot. @wking, do you perhaps have any idea for a solution?

wking commented 9 years ago

On Sat, May 23, 2015 at 08:56:58AM -0700, Piotr Banaszkiewicz wrote:

  • all_badges:
    • a "count in a loop" solved by using annotations: annotate(num_awarded=Count('award')) … Now the biggest issue is task_set.instructors.count, which we're using a lot. @wking, do you perhaps have any idea for a solution?

Annotate with a Count. See 1, although it sounds like you're alreay using this for the award counts.

pbanaszkiewicz commented 9 years ago

@wking:

Annotate with a Count.

Is that possible with a filter on related object? Ie. Annotate(Count(role__name="instructor"))? I think I tried that and it didn't work.

wking commented 9 years ago

On Sun, May 24, 2015 at 11:11:17AM -0700, Piotr Banaszkiewicz wrote:

Annotate with a Count.

Is that possible with a filter on related object? Ie. Annotate(Count(role__name="instructor"))? I think I tried that and it didn't work.

Ah, right. Conditional aggregation has only landed recently 1. I can't find docs for these new features, but the example at the end of ticket 11305 works for me:

from django.db.models import IntegerField, Count from django.db.models.expressions import Case, Value, When from workshops.models import Event

Event.objects.annotate( instructor_count=Count(Case(When(taskrolename='instructor', then=Value(1)), output_field=IntegerField())))[0].instructor_count

2

pbanaszkiewicz commented 9 years ago

Okay, nice find. It does work, but the overhead is terrible. 6 queries in 171ms compared to 31 queries in 11ms.

This method, however, shines when we display all events on one page (no pagination): 6.5s load time versus 1.2s.

@wking what do you think? Should we switch? If we display all items on one page we could easily enable DataTables for them. Otherwise we'd have to use AJAX endpoints to load the data.

Do you think in the long run we'd be better with less queries?

Or maybe the visible overhead happens only because of SQLite and it's not present in other DBs?

gvwilson commented 9 years ago

I don't think we should worry about optimizing queries yet - it's fast enough for our purposes, will still be fast enough with twice the data, and since the views are likely to shift significantly in the next 4-6 weeks as we get real users, we could well be optimizing things we don't actually need.

pbanaszkiewicz commented 9 years ago

You're right, but datatables are scheduled for v0.4, and they need to have a quick site. That's why I was thinking about optimizations.

gvwilson commented 9 years ago

Then let's push data tables back to 0.5.

wking commented 9 years ago

On Sun, May 24, 2015 at 02:43:13PM -0700, Piotr Banaszkiewicz wrote:

Okay, nice find. It does work, but the overhead is terrible. 6 queries in 171ms compared to 31 queries in 11ms.

When we can reduce the number of queries, I think that's a good thing. Slow queries can usually be optimized around with additional indexes. EXPLAIN QUERY PLAN is useful for figuring out why a given query is slow 1. But any problems we can push into the database layer and solve there are problems that don't need us maintaining Python code.

I also think that, while it's useful to keep chipping away at these things where we see a clear path forward to avoid building up too much technical debt, we don't need to solve all of the performance problems in one shot. So if something works well enough and we don't have time to get to the bottom of how to make it better, we can punt until we do have that time (or the current approach is no longer working well enough).

pbanaszkiewicz commented 9 years ago

Thank you both, @wking and @gvwilson, for feedback.

I think DataTables are a very important feature for other admins than Greg himself (as he's very familiar with the UI and probably knows who's organizing what workshops). I don't want to push them for later.

I have some code for optimizations in https://github.com/pbanaszkiewicz/amy/tree/optimization, we might need it in a near future, so I save it there.

Cheers! :)