UCL / TLOmodel-profiling

Profiling outputs for developer use
0 stars 0 forks source link

Pyinstrument grouping is incorrect #19

Open tamuri opened 4 months ago

tamuri commented 4 months ago

Not sure whether this can be fixed or corrected at all - probably not - but I noticed that the indentation of samples in the pyinstrument output is incorrect. Regular events are being placed as if they are being run by the health system scheduler. Any ideas?

Screenshot 2024-05-27 at 11 48 51
willGraham01 commented 4 months ago

This might be to do with the cross-class function calls in the codebase, and how the function which sets off the population level events is called?

So the population level events are being run by HealthSystem.run_population_level_events, but via a method which is only ever called via HealthSystemScheduler.module (which points to the HealthSystem) within HealthSystemScheduler.process_events_mode_2.

So I guess PyInstrumment is technically right that the pop-level events are run within the scheduler class, though I'd now expect the population-level events to be nested within HealthSystemScheduler.process_events_mode_2 if I'm following the call stack, rather than one level higher :sweat_smile:

tamuri commented 4 months ago

I don't think that's right. For example, the event HealthSeekingBehaviourPoll is not an HSI event, and is not run by the HealthSystem. It is processed in the main Simulation event loop, so I'd expect it to be under Simulation.fire_single_event.

tamuri commented 4 months ago

Compare with cprofile dump:

Screenshot 2024-06-04 at 07 25 18
matt-graham commented 21 hours ago

I've looked in to this a bit and think I've identified what is going on. The key issue seems to be that the class name used in the identifier for a frame corresponding to a method call will be written out as a particular subclass (specifically the mostly commonly observed subclass in the captured frames) even if the method being called is defined on a base class.

As a simple minimal reproducer, consider the following script corresponding to a very simplified version of the TLOmodel simulation call stack:


import time

def simulate():
    time.sleep(0.1)
    events = [EventA(), EventB(), EventA()] * 10
    for event in events:
        event.run()

class Event:

    def run(self):
        self.apply()

class EventA(Event):

    def apply(self):
        time.sleep(0.1)

class EventB(Event):

    def apply(self):
        time.sleep(0.1)

if __name__ == "__main__":
    simulate()

Running pyinstrument on this script (saved as pyinstrument_test.py) gives the following output using the console renderer:

  _     ._   __/__   _ _  _  _ _/_   Recorded: 15:46:49  Samples:  31
 /_//_/// /_\ / //_// / //_'/ //     Duration: 3.110     CPU time: 0.005
/   _/                      v4.7.3

Program: pyinstrument_test.py

3.110 <module>  pyinstrument_test.py:1
└─ 3.110 simulate  pyinstrument_test.py:4
   ├─ 3.009 EventA.run  pyinstrument_test.py:13
   │  ├─ 2.006 EventA.apply  pyinstrument_test.py:19
   │  │  └─ 2.006 sleep  <built-in>
   │  └─ 1.003 EventB.apply  pyinstrument_test.py:25
   │     └─ 1.003 sleep  <built-in>
   └─ 0.101 sleep  <built-in>

Here we see that underneath the top-level simulate call, all event apply method calls are grouped under a frame with identifier EventA.run. This corresponds in reality to the run method defined on the base Event class, which we can see from the correct line number reference in to the script. However as the concrete classes from which the run method is called are EventA and EventB, when recording these frames pyinstrument appears to record both these as the class associated with the method call and then resolve the ambiguity when rendering the output, just outputs the mosts commonly observed class name:

https://github.com/joerick/pyinstrument/blob/4b37f8cdc531be41a7f7e57932f0b770244025d5/pyinstrument/frame.py#L308-L309

https://github.com/joerick/pyinstrument/blob/4b37f8cdc531be41a7f7e57932f0b770244025d5/pyinstrument/frame.py#L284-L305

This seems like a bug to me or at least is a bit misleading so I'll raise an issue on pyinstrument repository and see if we can get fixed.

Importantly it does seem however the overall call tree recorded is correct and if we just look at the file / line number references in the existing output we can correctly identify whether methods are actually defined.

matt-graham commented 21 hours ago

Accidentally hit close while writing comment 🤦