exaxorg / accelerator

The Accelerator is a tool for fast and reproducible processing of eBay-scale datasets on a single computer.
https://exax.org
Apache License 2.0
3 stars 1 forks source link

Support real method profiling out-of-the-box #5

Open pabloyoyoista opened 2 weeks ago

pabloyoyoista commented 2 weeks ago

We've been using cProfile from python quite successfully. And the changes required for a method to be profiled have been mostly minimal.

The can be summarized in:

I managed to execute this in a custom way by renaming analysis and/or synthesis to run_analysis and run_synthesis. e.g:

Introducing the profiling analysis as this:

def analysis(sliceno, prepare_res):                                                                                                                                                                                
    cProfile.runctx(                                                                                                                                                                                               
        'run_analysis(sliceno, prepare_res)',                                                                                                                                                                      
        globals(),                                                                                                                                                                                                 
        locals(),                                                                                                                                                                                                  
        filename=f"profile.{sliceno}"                                                                                                                                                            
    )

And then aggregating and printing the results in synthesis:

def synthesis(slices):                                                                                                                                                                                                                                                                                                                                                                                  
    stats = pstats.Stats()                                                                                                                                                                                         
    for i in range(slices):                                                                                                                                                                                        
        stats.add(f"profile-{options.new_col}.{i}")                                                                                                                                                                
    stats.sort_stats('cumulative')                                                                                                                                                                                 
    #stats.print_stats()                                                                                                                                                                                           
    stats.print_callers() 

I believe being able to do this out-of-the-box from urd.build would be an extraordinary feature. We would certainly need to expose some way to print the statistics in the urd.build, for which I'd need some input. Otherwise I'd be quite happy to work on this implementation if deemed reasonable