META: Improvements to design and implementation of TLOmodel framework

UCL / TLOmodel

Epidemiology modelling framework for the Thanzi la Onse project

https://www.tlomodel.org/

MIT License

10 stars 4 forks source link

META: Improvements to design and implementation of TLOmodel framework #480

Open tamuri opened 2 years ago

tamuri commented 2 years ago

This issue serves as a scratch space to collect ideas related to the improvement of the software design and implementation of the TLO framework.

The hope is the proposed improvements will be executed for the TLM project.

Ideas can be considered here and when good/developed can turn into a dedicated issue.

tamuri commented 2 years ago

Write logs directly to HDF5 file. Benefits:

Store heterogenous structured data
Smaller file sizes whilst simulation is running, can handle large amounts of data
Can enforce stricter type control
Doesn't require post-processing step to parse logs

H5py docs Pandas read HDF PyTables

Or Parquet

For processing larger-than-memory logs, consider Dask.

matt-graham commented 2 years ago

Interactive dashboard for live visualisation of model state during runs. This could include things like time series of summary statistics of the population and visualisation of current state of event queues, nicer display of the log output (e.g. formatting to take advantage of structured output, ability to filter or search).

This could be useful for both providing richer information when debugging runs and provide a nice interface for monitoring runs and visualising their results once finished. Ideally the displayed elements, statistics etc. would be configurable, to allow creating visualisations appropriate for different use cases.

For the interface one possibility would be to create a browser-based app using something like Dash, with a backend app either directly running the TLO model or running a simulation in a separate process and getting information via the log files or other file-based output. An alternative would be to produce some form of PyCharm plug-in - this could allow tighter integration in to the PyCharm interface (e.g. exploiting the built-in interactive debugger) but it would probably require some Java expertise.

matt-graham commented 2 years ago

Increased automation of model calibration.

Having automated model calibration runs as part of continuous integration. This could be at the level automatically producing summary statistics / plots currently used to calibrate different model components and recording them as part of regular scheduled CI, which for example allow (manual) checking of how calibration is altered as PRs are merged in.
Automated fitting of parameters (or distribution over parameters) to calibrate against data. If we just want to get a single point estimate of parameters which give model outputs consistent with data we could use for example a Bayesian optimisation approach, for example Bayesian adaptive direct search. We could also take a more Bayesian approach and try to infer a posterior over the parameter space given prior distributions over the parameters, the model and observed data- the ELFI package could be useful here as it provides Python implementations of various approximate Bayesian computation algorithms.

tamuri commented 2 years ago

Reuse rows of deceased individuals in the population dataframe.

Once an individual dies their properties, in principle, do not change. There is some post-processing but otherwise the row is ignored. New individuals born into the population are new rows added to the population dataframe. This means the population dataframe keeps growing throughout the simulation run.

We could consider reusing rows of deceased individuals for new births. This would require some rework of the population index. Currently, the population dataframe index is simple range index which serves as both person ID and position in the dataframe. How about using a UUID as an index? This could be the person ID and would be used for operating on individuals. When an individual dies, the row could be processed and logged somewhere. Then the index for that row could be replaced with a new UUID. Likely that df.rename(index={'uuid1':'uuid2'}, inplace=True) would be faster than growing the dataframe by appending the rows. Would also keep the population dataframe as small as possible, which would speed up all other operations.

tamuri commented 2 years ago

Checks and validation of inputs by the framework, which can be disabled if necessary.

At the moment, if an error arises in the simulation (e.g. the addition of a column to the population dataframe), this is only picked up after the simulation is complete or when parsing the logs. We could add stringent checks within the framework to ensure the integrity of the dataframe as the simulation runs e.g. after every event check the names & types of columns, if changed exit immediately with error, noting the event that just ran. If these checks are asserts, then it's easy to switch off.

population dataframe defined property names and types
logger data for a particular key is consistent

tamuri commented 2 years ago

CI for later versions of Python and unpinned dependencies.

Set up a workflow to run the TLOmodel tests against recent/latest (?) version of Python and unpinned dependencies. Perhaps run once a month? It would help prepare for upgrading the target Python version and pinned dependencies. Track upcoming errors, warnings & deprecations. (This might be worth doing sooner rather than later...?)