GoogleCloudPlatform / cloud-profiler-python

Stackdriver Profiler Python agent is a tool that continuously gathers CPU usage information from Python applications
Apache License 2.0
27 stars 23 forks source link

[Wall time] Google Cloud Profiler Strange Behavior With Python time.sleep() #134

Open crimsonaltima opened 1 year ago

crimsonaltima commented 1 year ago

I have been experimenting with Google Cloud Profiler and Python.

I wanted to know if it was a reliable way to measure how long a function takes to run that contains sleep commands.

I have a gist with 3 sample scripts with slight variations. Mostly the same with differing wait times. The last is similar to the second except in the main loop it calls step1 multiple times at the end. https://gist.github.com/crimsonaltima/402b41ee23be20491d73f97832c0c017

Running these, produce the following results. Seq1

Screen Shot 2023-03-28 at 3 24 37 PM

Seq2

Screen Shot 2023-03-28 at 3 24 21 PM

Seq3

Screen Shot 2023-03-28 at 3 24 48 PM

One thing I find odd about the profiler is that it shows a max duration of 10 seconds. Despite some functions sleeping for over 25 seconds. Is there an explanation for this? While the bar sizes are accurate relative to their sleep time, it does not show exactly how long a function takes.

A second question, in mainseq3.py I call step 1 multiple times and noticed that all the steps1s were merged into a single bar and there is no distinction as for if this was the first call to step1 or a latter. Is that expected behavior?

A lesser question I have is does time.sleep(...) have odd effects on the profiler? I noticed the script with the really long sleep times took a very long time to begin uploading data to google cloud profiler.

aabmass commented 1 year ago

One thing I find odd about the profiler is that it shows a max duration of 10 seconds. Despite some functions sleeping for over 25 seconds. Is there an explanation for this? While the bar sizes are accurate relative to their sleep time, it does not show exactly how long a function takes.

This profiler is a sampling aka continuous profiler, meaning that it periodically checks what code is being executed. Compare this to deterministic profiling where every call is traced. The result is exactly what you found, the bar sizes are correct relative to each other but don't add up to true wall time as they only represent samples. Check out this doc for more discussion on how this works and what it's useful for.

A second question, in mainseq3.py I call step 1 multiple times and noticed that all the steps1s were merged into a single bar and there is no distinction as for if this was the first call to step1 or a latter. Is that expected behavior?

For now, the UI does not distinguish between different line numbers in a function.

A lesser question I have is does time.sleep(...) have odd effects on the profiler? I noticed the script with the really long sleep times took a very long time to begin uploading data to google cloud profiler.

If you are using the wall profiler, it should not have any effect afaik. The amount of time before uploading could be caused by many factors.