benfred / py-spy

Sampling profiler for Python programs
MIT License
12.53k stars 414 forks source link

De-duplicate speedscope output #473

Open jonashaag opened 2 years ago

jonashaag commented 2 years ago

Speedscope profiles can grow so large it's impossible to import them into speedscope.app.

One simple thing we could do is de-duplicate the samples + weights lists:

{
  "samples": [
   [1, 2, 3],
   [1, 2, 3],
   [4],
   ...
  ],
  "weights": [
    0.1,
    0.1,
    0.1,
    ...
  ]
}

->

{
  "samples": [
    [1, 2, 3],
    [4],
    ...
  ],
  "weights": [
    0.2,
    0.1,
    ...
  ]
}

Here's some Python code to do it:

def dedup(prof):                                                   
    last = prof["samples"][0]                                      
    n = 1                                                          
    samples2 = []                                                  
    weights2 = []                                                  
    def _append():                                                 
        samples2.append(last)                                      
        weights2.append(n * prof["weights"][i-1])                  
    for i in range(1, len(prof["samples"])):                       
        sample = prof["samples"][i]                                
        if sample == last:                                         
            n += 1                                                 
        else:                                                      
            _append()                                              
            last = sample                                          
            n = 1                                                  
    _append()                                                      
    prof["samples"] = samples2                                     
    prof["weights"] = weights2

import json
profile = json.load(open("my-speedscope-prof.json"))
for p in profile["profiles"]:
    dedup(p)
profile = json.dump(profile, open("my-speedscope-prof-deduped.json", "w"))