DocNow / diffengine

track changes to the news, where news is anything with an RSS feed
MIT License
177 stars 30 forks source link

Add an index to the EntryVersion "url" column #52

Closed ryanfb closed 5 years ago

ryanfb commented 5 years ago

After #51 (adding a limit to the version select statement), I noticed that this query was still taking quite a large portion of the runtime for diffengine instances with large databases. As it turns out, adding an index to the EntryVersion url column makes this query (and diffengine as a whole) run much faster. This change both sets an index in the peewee model, and runs a migration to try to add an index to any existing database (adapted from this StackOverflow answer).

edsu commented 5 years ago

Lovely, thanks @ryanfb ... and thank for pushing diffengine to its limits, and past them :-)

edsu commented 5 years ago

PS. How did you notice that specific query was taking some time. Was it an educated guess or did you do some fancy profiling tricks?

ryanfb commented 5 years ago

Using py-spy to attach to a running instance, I could see where the time was being spent - specifically, the flame graph showed the call stack for what statements were leading to the execute_sql call that was taking almost 100% of the time.

edsu commented 5 years ago

Ahh, very nice ... thanks for the tip!