casact / chainladder-python

Actuarial reserving in Python
https://chainladder-python.readthedocs.io/en/latest/
Mozilla Public License 2.0
192 stars 71 forks source link

Sigma Bug in Development Estimator #463

Closed chrisbooden closed 1 year ago

chrisbooden commented 1 year ago

Describe the bug When using standard errors it's common to exclude historical years or link ratios that are extreme outliers. When prepping the development estimator by excluding ratios it seems to correctly exclude the link ratio from the sigma calculation but still keeps it in the count for the 1/(n-1) factor.

In our company data we often exclude large amounts of historic years as are not reflective of current business, the resulting sigma values are then coming out too low as a result of this bug.

To Reproduce

import chainladder as cl

tri = cl.load_sample('raa')

tri_dev = cl.Development(drop = [('1981',12),('1981',24),('1981',36),('1981',48),('1981',60),('1981',72),('1981',84),('1981',96),('1981',108)]).fit_transform(tri)

display(tri_dev.sigma_)

image

Expected behavior

import chainladder as cl
import pandas as pd

df = pd.read_csv('chainladder/utils/data/raa.csv')
df = df[df["origin"] > 1981]

tri_2 = cl.Triangle(
    df,
    origin="origin",
    development="development",
    columns="values",
    cumulative=True
)

tri_dev_2 = cl.Development().fit_transform(tri_2)

display(tri_dev_2.sigma_)

image

Or as a unit test:

# Unit test for checking sigma values in the Development estimator
import chainladder as cl

def test_dev_sigma():
    # Method 1 for estimating sigma by excluding the first origin year 
    tri = cl.load_sample('raa')
    tri_dev = cl.Development(drop = [('1981',12),('1981',24),('1981',36),('1981',48),('1981',60),('1981',72),('1981',84),('1981',96),('1981',108)]).fit_transform(tri)

    # Remove the interpolated last value and the prev value (as this will be interpolated by the next method)
    sigma_1 = tri_dev.sigma_.iloc[0,0,0,:-2].values

    # Method 2 for estimating sigma by excluding the first origin year from the original data set
    df = pd.read_csv('chainladder/utils/data/raa.csv')
    df = df[df["origin"] > 1981]

    tri_2 = cl.Triangle(
        df,
        origin="origin",
        development="development",
        columns="values",
        cumulative=True
    )

    tri_dev_2 = cl.Development().fit_transform(tri_2)

    # Remove the interpolated last value (now array will have same length as method 1)
    sigma_2 = tri_dev_2.sigma_.iloc[0,0,0,:-1].values

    # Take difference and convert to a single list, round to a suitable value. Expected diffs are zero
    diff_sigma = [round(y,4) for y in (sigma_1 - sigma_2)[0][0][0]]

    # Expected diffs
    zeros = [0 for i in range(len(diff_sigma))]

    assert diff_sigma == zeros

Desktop (please complete the following information):

jbogaardt commented 1 year ago

Wow, I'm surprised that this defect exists. This is a super helpful bug report and unit test. Thank you for identifying @chrisbooden. We'll prioritize for next release.

jbogaardt commented 1 year ago

Released in v0.8.18

chrisbooden commented 1 year ago

Top man. Just re-tested this in the new release and can confirm it's resolved.