bloomberg / memray

Memray is a memory profiler for Python
https://bloomberg.github.io/memray/
Apache License 2.0
13.36k stars 397 forks source link

Create a demo program that showcases temporal flame graphs #392

Closed godlygeek closed 1 year ago

godlygeek commented 1 year ago

We probably want a demo where there's some setup code that uses a bunch of temporary memory, and then some other code that uses whatever was produced by the setup code, and we could demo how you could see what's going on in the setup phase vs the run phase. But that requires us to come up with some new demo code...

If we can come up with something good, we could add it as one of the example applications, and include its temporal flame graph in the docs so that people can easily play around with it.

Originally posted by @godlygeek in https://github.com/bloomberg/memray/issues/391#issuecomment-1584910466

mgmacias95 commented 1 year ago

Hi @godlygeek,

I made this simple script:

import random
from collections import Counter
import gc

def set_up():
  b = [random.choice(range(10000)) for _ in range(100000000)]
  b.sort()
  c = Counter(b)
  return dict(filter(lambda x: x[1] > 1000, c.items()))

def run():
  for _ in range(3):
    f = set_up()
    key = random.choice(list(f.keys()))
    print(f'{key}: {f[key]}')
    gc.collect()

run()

which generates the following flamegraph:

image

Would this be a good sample to showcase the temporal flamegraphs?

Thanks!

godlygeek commented 1 year ago

I like it, but I think we can probably come up with an even better example. I'm hoping to find something with two distinct peaks, with different sets of allocations on the heap at each peak. All of the peaks in this one are different calls to set_up(), because run() itself consumes very little memory after set_up() has returned.

Hm. Maybe a program that shows two different techniques for computing Fibonacci numbers or prime numbers or something like that and then checks that the two return the same values... or something in that direction... And then the reason to use the temporal flame graph would be to check what objects each of the techniques required on the heap.

mgmacias95 commented 1 year ago

Hm. Maybe a program that shows two different techniques for computing Fibonacci numbers or prime numbers or something like that and then checks that the two return the same values... or something in that direction...

Here you go!

import sys

def fib1(n):
  l = [0, 1]
  for i in range(2, n+1):
    l.append(l[i-1] + l[i-2])
  return l[-1]

def fib2(n, cache={0: 0, 1: 1}):
  if n in cache:
    return cache[n]
  cache[n] = fib2(n-1) + fib2(n-2)
  return cache[n]

def run():
  sys.setrecursionlimit(100000)
  n = 99900
  a = fib1(n)
  b = fib2(n)

  assert a == b

run()

image

mgmacias95 commented 1 year ago

Hey @godlygeek,

Were you able to take a look here?

Thanks!

godlygeek commented 1 year ago

Sorry for the delay - this looks quite good to me! I think it's an excellent demonstration of something where you can find out more information from our temporal flame graphs than you could from our default flame graphs.

I guess the next step here would be a PR that adds this to demo program as one of our example applications, and to include a temporal flame graph generated from it in our documentation.

If you'd like to do that, I'd be happy to take a PR for those changes. Note that the generated flame graph will contain some details about your system (paths to your Python code and C libraries, as well as the command line you launched your program with). Nothing too invasive, but you can say that you'd rather have us generate the flame graph instead of you if that makes you nervous.

Thanks for the help here, Marta!

mgmacias95 commented 1 year ago

Hi @godlygeek,

Made the PR -> https://github.com/bloomberg/memray/pull/412. Check it out.

Thanks.

godlygeek commented 1 year ago

Resolved by #412 - thanks @mgmacias95!