casact / chainladder-python

Actuarial reserving in Python
https://chainladder-python.readthedocs.io/en/latest/
Mozilla Public License 2.0
184 stars 71 forks source link

[BUG] .ultimate_ does not match the final period of .full_triangle_ #532

Open tobycook97 opened 2 months ago

tobycook97 commented 2 months ago

Describe the bug Doing .ultimate_ on a chainladder model, I'd expect the figures to match the final column of .latestdiagonal . However it does not. This seems to occur when there are missing values in some of the triangle.

To Reproduce

import chainladder as cl

tri = cl.load_sample('clrd')

dev = cl.Development().fit(tri)

dev = dev.transform(tri)

model = cl.Chainladder().fit(dev) 

ult = model.ultimate_
full_tri = model.full_triangle_

print(ult[(ult.index['GRNAME'] == 'Adriatic Ins Co') & (ult.index['LOB'] == 'othliab') ]['IncurLoss']) # as an example 

print(full_tri[(full_tri.index['GRNAME'] == 'Adriatic Ins Co') & (full_tri.index['LOB'] == 'othliab') ]['IncurLoss'])

Expected behavior Latest diagonals of both full_triangle and ultimate match

assert full_tri.latest_diagonal == ult.latest_diagonal

Desktop (please complete the following information):

jbogaardt commented 2 months ago

Hi @tobycook97, thanks for the report. It's very easy to follow and we will resolve at next release. I think we need to investigate why the full_triangle property is bugging out on this one, possibly related to the very sparsely populated LDFs.

tobycook97 commented 2 months ago

Thanks @jbogaardt, appreciate it.

In case it's useful to anyone, I found the problem arises where the first origin periods of the triangle are null. My workaround is to remove these (and to remove the corresponding development periods).

def remove_na_from_triangle(tri: cl.Triangle, column: str = 'IncurLoss') -> cl.Triangle:
        """
            Removes the origin periods that are all na, and not preceded by any non null origin periods
        """
        origins_dataframe = tri[column].sum(axis=3).to_frame().fillna(method='ffill') # sum across dev periods and make a 
        # df so we can get the null values (i'm sure there is a better way)
        # Uses ffill to ensure we are only removing the first null origins (not the ones in the mid years) 

        non_na_origins = origins_dataframe[origins_dataframe.iloc[:,0].notna()].index.to_period(freq='Y')

        devs_dataframe = tri[column].sum(axis=2).to_frame() # sum across devs and make a df  
        na_devs = devs_dataframe.columns[devs_dataframe.iloc[0].isna()]

        tri = tri[tri.origin.isin(non_na_origins)] # filter out
        tri = tri[~tri.development.isin(na_devs)]

        return tri 

The drawback of this is you will need to iterate over the index and remove the NA values individually if you want to fit each triangle individually. (However this is unlikely as you will probably want to make dev patterns at a higher level than the one shown in the example above)