Open nuttamas opened 3 years ago
Profiling code #186 Try the given code in the JupyterNotebook
def sum_of_lists(N): total = 0 for i in range(5): L = [j ^ (j >> i) for j in range(N)]
j >> i == j // 2 ** i (shift j bits i places to the right)
# j ^ i -> bitwise exclusive or; j's bit doesn't change if i's = 0, changes to complement if i's = 1 total += sum(L)
return total
and run %prun sum_of_lists(10_000_000)
Got result below
14 function calls in 13.066 seconds Ordered by: internal time ncalls tottime percall cumtime percall filename:lineno(function)
5 10.708 2.142 10.708 2.142
:4( ) 5 1.422 0.284 1.422 0.284 {built-in method builtins.sum} 1 0.761 0.761 12.891 12.891 :1(sum_of_lists) 1 0.175 0.175 13.066 13.066 :1( ) 1 0.000 0.000 13.066 13.066 {built-in method builtins.exec} 1 0.000 0.000 0.000 0.000 {method 'disable' of '_lsprof.Profiler' objects}
The lines that take most time are L = [j ^ (j >> i) for j in range(N)]
and total += sum(L)
Then try
%load_ext line_profiler
%lprun -f sum_of_lists sum_of_lists(10_000_000)
Got the result
Timer unit: 1e-06 s
Total time: 19.2983 s File:
Function: sum_of_lists at line 1 Line # Hits Time Per Hit % Time Line Contents
1 def sum_of_lists(N): 2 1 2.0 2.0 0.0 total = 0 3 6 22.0 3.7 0.0 for i in range(5): 4 5 18459236.0 3691847.2 95.7 L = [j ^ (j >> i) for j in range(N)] 5 # j >> i == j // 2 ** i (shift j bits i places to the right) 6 # j ^ i -> bitwise exclusive or; j's bit doesn't change if i's = 0, changes to complement if i's = 1 7 5 839024.0 167804.8 4.3 total += sum(L) 8 1 1.0 1.0 0.0 return total
The line that takes most time is the looping command: L = [j ^ (j >> i) for j in range(N)]
Approximating π using Numba/Cython #195
run calc_pi_numba.py
, got the result
Elapsed (with compilation) = 900 msec pi = 3.1272 (with 10000) Elapsed (after compilation) = 180 μsec pi = 3.142 (with 10000)
The code takes much less time than the original.
Using Cython, got the result
5.48 ms ± 112 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
takes longer than number
Improving performance using MPI #194 The code uses the Message Passing Interface (MPI) to accomplish this approximation of π.
run python calc_pi.py -np 10_000_000 -n 1 -r 1
got
pi = 3.1411436 (with 10000000) 1 loops, best of 1: 7.97 sec per loop
numpy is the fastest MPI could work faster when performing parallel tasks.
Running the given function without numpy
python -m timeit -n 100 -r 5 -s "from calc_pi import calculate_pi_timeit" "calculate_pi_timeit(10_000)()"
get
After use numpy functions, run:
python -m timeit -n 100 -r 5 -s "from calc_pi_np import calculate_pi_timeit" "calculate_pi_timeit(10_000)()"
get
numpy makes the code faster.