Closed cladier closed 7 months ago
I think this is an issue of ambiguity. The Triangle
signature asks for a lot of things already and we aim to keep as many items that can be inferred from the data as optional. The issue we're running into is that we have one accident month with two development periods. With that information, the constructor has to:
So in this case, its creating annual accident periods with a fiscal period from June-2014 through May-2015 taking the origin from the beginning of the period. It does this because the trailing
argument in the constructor defaults to True. This argument assumes fiscal period end of May.
>>> tri = cl.Triangle(
... db,
... origin="claimDate",
... development="viewDate",
... columns="ones",
... cumulative=False
... )
dtype: int64
>>> tri.origin
PeriodIndex(['2014', '2015'], dtype='period[A-MAY]', name='origin')
>>> tri.origin_grain
'Y'
You can override the trailing
argument and get the desired behavior:
>>> tri = cl.Triangle(
... db,
... development="viewDate",
... columns="ones",
... cumulative=False,
... trailing=False # add this to coerce to a December year end
...
... )
dtype: int64
>>> tri.origin
PeriodIndex(['2015'], dtype='period[A-DEC]', name='origin')
>>> tri
6 7
2015 2.0 2.0
When you don't know that some countries have a fiscal year starting in May, the origin labelling is indeed confusing. Makes sense now thanks a lot !
Glad this helps. Technically, the default behavior for trailing=True
is to assume the latest origin month in your data as the fiscal close. I'm not sure how many companies actually use May globally - probably very few.
Hi @jbogaardt , sorry to bother again, but there's a behaviourI quite don't get and I suspect it's the same kind of problem:
db = pd.DataFrame({'ones': {14: 1, 15: 1, 16: 1, 17: 1},
'claimDate': {14: Timestamp('2015-05-22 00:00:00'),
15: Timestamp('2015-05-22 00:00:00'),
16: Timestamp('2015-05-22 00:00:00'),
17: Timestamp('2015-05-22 00:00:00')},
'viewDate': {14: Timestamp('2016-01-31 00:00:00'),
15: Timestamp('2016-01-31 00:00:00'),
16: Timestamp('2016-02-29 00:00:00'),
17: Timestamp('2016-02-29 00:00:00')}})
cl.Triangle(db,
origin="DateSurvenance",
development="viewDate",
columns="ones",
cumulative=False,
trailing=False
)
=> Why are the 4 values regrouped and not split between 2 different oldings ?
I've built a bigger table where I view these same two claims at the end of each month and I sometime get months skipped and passed to the next, but I have no clue of why this happens :
This is related to #494 and is a result of changes in pandas>=0.2.2. We've patched master branch to accomodate and it works as intended there. Just need to get a release out pypi.
Indeed, just installing from github fixes this. Thanks a lot for all the amazing work !
I'm not getting the correct behaviour, the example will speak for itself :
=> Why does the origin starts in 2014 ?
Desktop: pandas: 2.2.1 numpy: 1.24.4 chainladder: 0.8.18