Add profiling episode - Githubissues

ashwinvis commented 2 months ago

Some of the text is already here:

https://enccs.github.io/hpda-python/optimization/#cprofile

What needs to be done is:

Change the example to run cProfile on wordcount.py
Show how to launch Snakeviz
Run line_profiler on the most CPU intensive function in the script

It can be written in rst if it is makes life easier. Otherwise use rst-to-myst tool and add it to:

https://github.com/ENCCS/python-perf/blob/main/content/profile.md

ashwinvis commented 2 months ago

This would be the command

$ python -m cProfile -o wordcount.prof source/wordcount.py data/concat.txt processed_data/concat.dat

and here's how to use pstats

$ python -m pstats wordcount.prof 
Welcome to the profile statistics browser.
wordcount.prof% sort tottime
wordcount.prof% stats
Wed Sep 25 11:52:27 2024    wordcount.prof

         53473208 function calls in 8.410 seconds

   Ordered by: internal time

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
  1233410    4.151    0.000    7.204    0.000 source/wordcount.py:41(update_word_counts)
 32068660    1.799    0.000    1.799    0.000 {method 'replace' of 'str' objects}
  7747363    0.570    0.000    0.570    0.000 {method 'lower' of 'str' objects}
  7747363    0.428    0.000    0.428    0.000 {method 'strip' of 'str' objects}
  1530212    0.271    0.000    0.271    0.000 source/wordcount.py:23(<genexpr>)
  1233411    0.256    0.000    0.256    0.000 {method 'split' of 'str' objects}
        1    0.184    0.184    7.388    7.388 source/wordcount.py:59(calculate_word_counts)
   382553    0.133    0.000    0.404    0.000 {method 'join' of 'str' objects}
        1    0.126    0.126    0.580    0.580 source/wordcount.py:16(save_word_counts)
...

ffrancesco94 commented 2 months ago

Thanks! Will include that and then show snakeviz as well.

ffrancesco94 commented 2 months ago

Should I make the IPython version as well?

ashwinvis commented 2 months ago

If possible

ffrancesco94 commented 2 months ago

It's quite interesting that if I run cProfile from the shell, everything works. If use IPython with

%run -p -D wordcount.prof source/wordcount.py data/concat.txt processed-data/concat.dat

~~it complains about an invalid unicode character.~~

Nvm, was calling it wrong.

ffrancesco94 commented 2 months ago

Look at #3.

ENCCS / python-perf

Add profiling episode #2