Closed ashwinvis closed 2 months ago
This would be the command
$ python -m cProfile -o wordcount.prof source/wordcount.py data/concat.txt processed_data/concat.dat
and here's how to use pstats
$ python -m pstats wordcount.prof
Welcome to the profile statistics browser.
wordcount.prof% sort tottime
wordcount.prof% stats
Wed Sep 25 11:52:27 2024 wordcount.prof
53473208 function calls in 8.410 seconds
Ordered by: internal time
ncalls tottime percall cumtime percall filename:lineno(function)
1233410 4.151 0.000 7.204 0.000 source/wordcount.py:41(update_word_counts)
32068660 1.799 0.000 1.799 0.000 {method 'replace' of 'str' objects}
7747363 0.570 0.000 0.570 0.000 {method 'lower' of 'str' objects}
7747363 0.428 0.000 0.428 0.000 {method 'strip' of 'str' objects}
1530212 0.271 0.000 0.271 0.000 source/wordcount.py:23(<genexpr>)
1233411 0.256 0.000 0.256 0.000 {method 'split' of 'str' objects}
1 0.184 0.184 7.388 7.388 source/wordcount.py:59(calculate_word_counts)
382553 0.133 0.000 0.404 0.000 {method 'join' of 'str' objects}
1 0.126 0.126 0.580 0.580 source/wordcount.py:16(save_word_counts)
...
Thanks! Will include that and then show snakeviz as well.
Should I make the IPython version as well?
If possible
It's quite interesting that if I run cProfile from the shell, everything works. If use IPython with
%run -p -D wordcount.prof source/wordcount.py data/concat.txt processed-data/concat.dat
it complains about an invalid unicode character.
Nvm, was calling it wrong.
Look at #3.
Some of the text is already here:
https://enccs.github.io/hpda-python/optimization/#cprofile
What needs to be done is:
cProfile
onwordcount.py
line_profiler
on the most CPU intensive function in the scriptIt can be written in rst if it is makes life easier. Otherwise use rst-to-myst tool and add it to:
https://github.com/ENCCS/python-perf/blob/main/content/profile.md