Closed matt-graham closed 6 months ago
My understanding of this function is that it acts as an initial trigger for the scheduling of specific HSI events connected with different disease modules, but beyond scheduling further HSI events, should not immediately make any changes to the properties of the individual with the associated
person_id
? @tbhallett can I check this is correct.
100% correct
This sounds excellent! :-)
~SymptomManager does have a have_what(person_ids)
method which could be used to efficiently get all the symptoms for persons. It's using apply
right now but can probably be made more efficient using native operations on the dataframe (maybe, flatten the symptom columns for persons of interest, change to True/False indicating >0, then collect into list...?)~ Hmm, misunderstood your comment. It's for a single individual.
Hi all - in light of Matt's suggestions for reorganising and speeding up how the do_at_generic_first_appt_non_emergency
function (and classes that are involved), I've opened #1289 which provides some typehints and a slight refactoring to hopefully make development on this issue slightly easier.
Having taken a dive into the code, I think Matt's idea with some slight modifications is the fastest way to improving performance here.
The Module
base class gets two new methods; do_at_generic_appt_{non_}emergency
that have the same function signature, and should be overwritten by the disease modules that inherit from Module
.
class Module:
def do_at_generic_appt_non_emergency(
target,
read_only_details,
symptoms,
diagnosis_runner,
) -> hsi_events_to_schedule, hsi_event_scheduling_options, dict_of_df_updates
then the existing functions in the hsi_generic_first_appts
module can be reworked as
def do_at_generic_first_appt_non_emergency(hsi_event):
# Make all necessary dataframe reads into immutable types.
# Do this once, at the start of the function so we don't repeatedly make DF accesses.
# Track all changes to the DF that disease modules want to make
all_wanted_df_updates = {}
# Loop over all modules in the simulation to see if they want to schedule an event
for disease_module in modules:
events_to_schedule, event_scheduling_options, df_updates = module.do_at_generic_appt_non_emergency(...)
# Schedule all events that were returned
for event, options in zip(events_to_schedule, event_scheduling_options):
schedule_hsi(event, **options)
# Record, but do not yet implement, any DF changes
all_wanted_df_updates |= df_updates
# After the loop, make DF updates, then return.
Some comments on this:
HSI_Event.apply()
method is called, rather than when they are scheduled? Would appreciate some insight into whether this is a grounded belief!Module
-level function is necessary due to how the rest of the HealthSystem
, HSI_Event
, and supporting classes interact with each other. There's some scope to reduce this complexity with a refactoring if we wanted to - but if we just want to see a performance improvement, this is what we'll need to go with for the time being.hsi_generic_first_appts
can actually be turned into class methods, since they are only called within the HSI_GenericNonEmergencyFirstAppt
class and their single argument is passed as hsi_event=self
.Thoughts and questions are welcome - I will be experimenting with these changes (and requesting some profiling runs) on https://github.com/UCL/TLOmodel/pull/1292 to reflect what's here.
Raising https://github.com/UCL/TLOmodel/issues/407 here which we're also addressing in this issue
In the latest scheduled profiling run, 21.4% of the overall run time was spent in calls to
do_at_generic_first_appt_non_emergency
. Looking at the breakdown of time spent inside the function, most of the time is spent in evauating_LocIndexer.__getitem__
(that is performingloc
indexing operations on the population dataframe), both directly and as part ofSymptomManager.has_what
calls.My understanding of this function is that it acts as an initial trigger for the scheduling of specific HSI events connected with different disease modules, but beyond scheduling further HSI events, should not immediately make any changes to the properties of the individual with the associated
person_id
? @tbhallett can I check this is correct.If this is the case, then we can avoid the repeated dataframe access by doing a single initial read of the individuals properties at the beginning of the function and then passing a datastructure containing these values to all called functions. Ideally we would make this datastructure immutable - for example created a
namedtuple
to be explicit that the properties are read only and cannot be updated. Given a lot of the dataframe accesses are part ofSymptomManager.has_what
calls, we would need to refactor this / provide an alternative method to allowing passing the properties directly (as for example a named tuple), and potentially also evaluate the symptoms the individual has once and pass in explicitly to called functions to avoid overhead in reconstructing the symptom list multiple times.If we do require allowing direct updates to individual properties as part of these calls, and alternative to using an immutable data structure, would be to create a class which will allows changing property values, but records which property are changed and writes these back to the population dataframe at the end of the call.
A possibly nice way to do all of this and also simplify the structure of
do_at_generic_first_appt_non_emergency
(and analagouslydo_at_generic_first_appt_emergency
) would be to add methods (defaulting to a no-op) to the baseModule
class which deal with scheduling the necessary follow-on HSI events from a generic first [non-emergency | emergency] appointment for each disease module, something likewith corresponding functions in
hsi_generic_first_appts
modulewhich avoids the large number of
if
statements and module specific logic in the current implementation. Given it's passed to these function we could also pass throughsqueeze_factor
to the modules methods if its thought this will be used at some point in future.My main concern with this approach would be the difficulty in enforcing the condition that modules
do_at_generic_first_appt_non_emergency
anddo_at_generic_first_appt_emergency
only schedule HSI events and do not edit properties of the individual directly (which would invalidate the approach of reading the properties once at beginning of method). We could just rely on documenting this expectation and adding tests to check it is adhered to.Alternatively at the cost of more complex function signatures, we could make
do_at_generic_first_appt_non_emergency
anddo_at_generic_first_appt_emergency
static methods of the modules, without direct access to the module attributes (including the population data frame). In this case we would need to pass through all objects that would be needed for dealing with scheduling following on HSI events as arguments.