Generate tables of results, which are extracted from a data source.
Part of the Botech ecosystem, written by Forecast Health Australia
This repository was created because we were generating a lot of health-economic model data, often from different sources, and we wanted to compare it in lots of ways.
I was frustrated with hard coding comparisons, and felt there was a better way to do it. There are lots of solutions to this type of thing, but it seemed marginally easier to write a new package that we could modify as our use cases changed.
csv
of Recordsgit clone https://github.com/ForecastHealth/botech-comparisons.git
pip install -r requirements.txt
(use a virtual environment)There are two python dataclasses
and one enum
which are important to understand: The Record
, the Comparison
and the Filter
.
You can find their definitions in the datatypes module.
In particular:
Record
is important, because we expect the underlying database to be a CSV
file with the schema of a Record
Filter
is important, because you can filter and group using these elements.Write a config.json
, which defines the following:
data_type
: the type of table you want to return (explained below)
blueprint
(note - this is probably not useful unless you already know what it is)filtered_records
(individual records)comparisons
(comparisons of records - probably what you want)data_format
: the format of the table you want to return csv
, html
, dataframe
, or self
.
dataframe
is a pandas.DataFrame
self
is a list
of the data typescenarios
is a list of exactly two elements, where each element corresponds to a scenario
. These must be labelled in your dataset, e.g. baseline
and scale-up
groups
is a list of lists, with each nested list being the ways you want to present the data. For instance, if you have list ["region", "income"]
, this means you want the data to be presented by region by income e.g. "North America x High Income", "Oceania x Low Income", etc.filters
are dictionary of Filters where the value is a list of values that you want to include. e.g.
"income": ["HIGH INCOME"]
will only include results from high income countries"country": ["BRA", "MOZ"]
will only include results from Brazil and Mozambique{
"data_type": "comparisons",
"data_format": "html",
"scenarios": ["baseline", "scaleup"],
"groups": [
["region", "income"]
],
"filters": {
"income": ["HIGH INCOME"],
"intervention": [0]
}
}
Please refer to the init.py to read the high-level api create_tables()
.
The configuration can be created by parsing a JSON configuration using parse_configurations
and the data
will need to be provided by the user and parsed using something like pandas.read_csv()
.
Feel free to fork, or submit a user issue. If you'd like to be added as a contributor, please message me, or email our website Forecast Health Australia
Tests are built with unittest
and can be run locally using
python -m unittest discover tests
from the root directory.
Rory Watts, Forecast Health Australia
This repository is licensed under the Apache 2.0 License.
Please