UCL-COMP0233-2022-2023 / RSE-Classwork

11 stars 67 forks source link

Profiling code #55

Open dpshelio opened 1 year ago

dpshelio commented 1 year ago

Even when we measure the total time that a function takes to run (#54), that doesn't help us with knowing which parts of the code are slow!

To look into that, we need to use a different too called a profiler. Python comes with its own profiler, but we will use a more convenient tool.

Setup

This exercise will work with IPython or Jupyter notebooks, and will use two "magic" commands available there. You may need some steps to set them up first.

If you use Anaconda, you should already have access to Jupyter. If you don't, let us know on Moodle or use pip install ipython to install IPython.

The %prun magic should be already available with every installation of IPython/Jupyter. However, you may need to install the second magic (%lprun). If you use Anaconda, run conda install line_profiler from a terminal. Otherwise, use pip install line_profiler.

Using profiling tools in IPython/Jupyter notebook

prun's magic gives us information about every function called.

  1. Open a Jupyter notebook or an IPython terminal.
  2. Add an interesting function (from Jake VanderPlas's book)
    def sum_of_lists(N):
       total = 0
       for i in range(5):
           L = [j ^ (j >> i) for j in range(N)]
           # j >> i == j // 2 ** i (shift j bits i places to the right)
           # j ^ i -> bitwise exclusive or; j's bit doesn't change if i's = 0, changes to complement if i's = 1
           total += sum(L)
       return total
  3. run %prun:
    %prun sum_of_lists(10_000_000)
  4. Look at the table of results. What information does it give you? Can you find which operation takes the most time? (You may find it useful to look at the last column first)

Using a line profiler in IPython/Jupyter

While prun presents its results by function, the lprun magic gives us line-by-line details.

  1. Load the extension on your IPython shell or Jupyter notebook
    %load_ext line_profiler
  2. Run %lprun
    %lprun -f sum_of_lists sum_of_lists(10_000_000)
  3. Can you interpret the results? On which line is most of the time spent?

Finishing up

When you are done, react to this issue using one of the available emojis, and/or comment with your findings: Which function takes the most time? Which line of the code?

anda-raluca commented 1 year ago

Line 4, L = [j ^ (j >> i) for j in range(N)]