This package is for age-period-cohort and extended chain-ladder analysis. It allows for model estimation and inference, visualization, misspecification testing, distribution forecasting and simulation. The package covers binomial, (generalized) log-normal, normal, over-dispersed Poisson and Poisson models. The common factor is a linear age-period-cohort predictor. The package uses the identification method by Kuang et al. (2008) implemented as described by Nielsen (2015) who also discusses the use of the R package apc
which inspired this package.
Version 1.0.2 fixes some bugs introduced by pandas 0.25.0. apc 1.0.2 now requires pandas >=0.24.0. Further, the version refactors some of the unittests and removes deprecated behavior.
Version 1.0.1 fixes some typos and refactors production code.
Version 1.0.0 adds a number of new features. Among them are
import apc
model = apc.Model()
model.data_from_df(pandas.DataFrame)
model.plot_data_sums()
model.plot_data_heatmaps()
model.plot_data_within()
model.fit(family, predictor)
model.plot_residuals()
model.identify()
model.plot_parameters()
model.fit_table()
apc.r_test(pandas.DataFrame, family_null, predictor)
model.sub_model(age_from_to, per_from_to, coh_from_to)
apc.bartlett_test(sub_models)
apc.f_test(model, sub_models)
model.forecast()
model.plot_forecast()
model.simulate(repetitions)
The package includes vignettes that replicate the empirical applications of a number of papers.
The following data are included in the package.
These data are for counts of mesothelioma deaths in the UK in age-period space. They may be modeled with a Poisson model with "APC" or "AC" predictor. The data can be loaded by calling apc.asbestos()
.
Source: Martinez Miranda et al. (2015).
These data includes counts of deaths from lung cancer in Belgium in age-period space. This dataset includes a measure for exposure. It can be analyzed using a Poisson model with an “APC”, “AC”, “AP” or “Ad” predictor. The data can be loaded by calling apc.Belgian_lung_cancer()
.
Source: Clayton and Schifflers (1987).
Data for an insurance run-off triangle in cohort-age (accident-development year) space. This data is pre-formatted. These data are well known to require a period/calendar effect for modeling. They may be modeled with an over-dispersed Poisson "APC" predictor. The data can be loaded by calling apc.loss_BZ()
.
Source: Barnett and Zehnwirth (2000).
Data for an insurance run-off triangle in cohort-age (accident-development year) space. This data is pre-formatted.
May be modeled with an over-dispersed Poisson model, for instance with "AC" predictor. The data can be loaded by calling apc.loss_TA()
.
Source: Taylor and Ashe (1983).
Data for insurance run-off triangle of paid amounts (units not reported) in cohort-age (accident-development year) space.
Data from Codan, Danish subsidiary of Royal & Sun Alliance.
It is a portfolio of third party liability from motor policies. The time units are in years.
Apart from the paid amounts, counts for the number of reported claims are available. The paid amounts may be modeled with an over-dispersed Poisson model with "APC" predictor. The data can be loaded by calling apc.loss_VNJ()
.
Source: Verrall et al. (2010).
These US casualty data are from the insurer XL Group. Entries are gross paid and reported loss and allocated loss adjustment expense in 1000 USD. Kuang and Nielsen (2018) consider a generalized log-normal model with "AC" predictor for these data. The data can be loaded by calling apc.loss_KN()
.
data_format
was not CA or AC. The problem is that the forecasting design is generated by first casting the data into an AC array from which the future period index is generated.data_vector
as output by Model().data_as_df()
are strings. Thus, sorting may yield unintuitive results for breaks in length of the range components. For example, sorting 1-3, 4-9, 10-11 yields the ordering 1-3, 10-11, 4-9. This results in mislabeling of the coefficient names later on since those are taken from sorted indices. A possible, if ugly, fix could be to pad the ranges with zeros as needed.