UCL-RITS / rse-classwork-2020

4 stars 113 forks source link

Profiling code #186

Open ageorgou opened 3 years ago

ageorgou commented 3 years ago

We have seen how to measure the total time that a function takes to run (#185), but that doesn't help us with knowing which parts of the code are slow!

To look into that, we need to use a different too called a profiler. Python comes with its own profiler, but we will use a more convenient tool.

Setup

This exercise will work with IPython or Jupyter notebooks, and will use two "magic" commands available there. The %prun magic should be available with every installation of the IPython/Jupyter. However, you may need to install the second magic (%lprun). If you use Anaconda, run conda install line_profiler from a terminal. Otherwise, use pip install line_profiler.

Using profiling tools in IPython/Jupyter notebook

prun's magic gives us information about every function called.

  1. Open a jupyter notebook or an IPython terminal
  2. Add an interesting function (from Jake VanderPlas's book)
    def sum_of_lists(N):
       total = 0
       for i in range(5):
           L = [j ^ (j >> i) for j in range(N)]
           # j >> i == j // 2 ** i (shift j bits i places to the right)
           # j ^ i -> bitwise exclusive or; j's bit doesn't change if i's = 0, changes to complement if i's = 1
           total += sum(L)
       return total
  3. run %prun:
    %prun sum_of_lists(10_000_000)
  4. Look at the table of results. What information does it give you? Can you find which operation takes the most time? (You may find it useful to look at the last column first)

Using a line profiler in IPython/Jupyter

While prun presents its results by function, the lprun magic gives us line-by-line details.

  1. Load the extension on your IPython shell or Jupyter notebook
    %load_ext line_profiler
  2. Run %lprun
    %lprun -f sum_of_lists sum_of_lists(10_000_000)
  3. Can you interpret the results? On which line is most of the time spent?