casact / chainladder-python

Actuarial reserving in Python
https://chainladder-python.readthedocs.io/en/latest/
Mozilla Public License 2.0
192 stars 71 forks source link

Having some trouble with origin periods #367

Closed henrydingliu closed 1 year ago

henrydingliu commented 2 years ago

Trying to bring in premium data for BF. For some reason the origin type gets set to Q-JUN (loss had was Q-DEC no problem).

prem = [1000,1200,1300,1400,1500,1600,1700,1800]
dates = ["2015-07-01","2015-10-01","2016-01-01","2016-04-01","2016-07-01","2016-10-01","2017-01-01","2017-04-01"]
premium_df = pd.DataFrame(data={"Premium":prem,"Date":dates})
premium_df["val"] = "2017-12-31"
premium_tri = cl.Triangle(
    data=premium_df,
    origin="Date",
    development="val",
    columns="Premium",
    cumulative=False,
)
premium_tri.origin
jbogaardt commented 2 years ago

I thought I got all the dates that are off-cycle from traditional calendar periods. Admittedly these types of triangles haven't been tested nearly enough. Thanks for reporting this. It will be a good unit test.

henrydingliu commented 2 years ago

I tried to locate the issue, but got lost in the odims and the split stuff. any direction would be great

It also isn't the only roadblock for independently constructing weight triangles. Since sample weights are required to be triangles, and have the same grain, I've had to create dummy diagonals to accommodate.

prem = [1000,1200,1300,1400,1500,1600,1700,1800]
dates = ["2015-07-01","2015-10-01","2016-01-01","2016-04-01","2016-07-01","2016-10-01","2017-01-01","2017-04-01"]
premium_df = pd.DataFrame(data={"Premium":prem,"Date":dates})
premium_df["val"] = "2017-12-31"
premium_tri = cl.Triangle(
    data=premium_df,
    origin="Date",
    development="val",
    columns="Premium",
    cumulative=False,
)
print(premium_tri.origin)
print(premium_tri.development_grain)
premium_tri = premium_tri.grain("OQDQ")
print(premium_tri.development_grain)
premium_df_copy = premium_df.copy()
premium_df_copy["val"] = "2017-10-31"
premium_tri_copy = cl.Triangle(
    data=pd.concat([premium_df,premium_df_copy]),
    origin="Date",
    development="val",
    columns="Premium",
    cumulative=False,
)
print(premium_tri_copy.development_grain)
premium_tri_copy = premium_tri_copy.grain("OQDQ")
print(premium_tri_copy.development_grain)
kennethshsu commented 2 years ago

@henrydingliu, did you have trouble fixing the source code or were you trying to get around it? The problem is that the declaration/instantiation of the Triangle object isn't correct.

Do you intent to have development_grain as monthly, quarterly, or annually? How would the Triangle object know which one to use? For "2017-12-31", it can be all three, but for "2017-10-31", it can be monthly or quarterly.

henrydingliu commented 2 years ago

having trouble fixing the source code. got lost around line 186.

in my example, i intend for development grain to be quarterly. the issue is, if i pass in a single diagonal, I'm unable to set the development grain to quarterly. i had to add a dummy second diagonal in order to get to quarterly, so it's compatible to use with the BF estimator.

kennethshsu commented 2 years ago

@jbogaardt do you want to take this one? There's a #373 that addressed #294 and #313, and I think this one is similar, which I can take a look. Let me know!

jbogaardt commented 2 years ago

Its all you @kennethshsu . I may still refactor after your PR to just make the code more legible

jbogaardt commented 2 years ago

I fixed the 'Q-JUN' issue on this one in f821992.

For development_grain being monthly, it is impossible to know with a single valuation date whether the development grain should be 'M', 'Q', 'S' or 'A', we could perhaps do some inference from the raw column in the supplied pandas dataframe, but that can get tricky. Instead, we've opted for assuming the lowest grain possible ('M') so that the end user can bring it up to a higher grain if they wish 'OQDQ' like you did. Still, managing development_grain on an exposure vector is probably an uncecessary burden on the end user. As long as the origins are aligned when performing triangle arithmetic with an exposure vector, it shouldn't really matter what development_grain that exposure vector has.

However, the master branch currently fails here, putting the burden on the end-user to manage the grain:

import chainladder as cl
prism = cl.load_sample('prism')['Paid'].sum().incr_to_cum().grain('OQDM')
exposure = prism.latest_diagonal
prism = prism.grain('OQDQ')
# This doesn't work because prism.development_grain = 'Q' and exposure.development_grain = 'M'
prism / exposure

I'll issue an additional fix to allow this arithmetic to occur.

henrydingliu commented 2 years ago

i dont mind setting the exposure grain manually. the issue is that right now it's not possible when the exposure triangle has a single valuation.

jbogaardt commented 2 years ago

Ok, either way, the exception handling has been relaxed on that one. I believe master should now work as you intend. If I missed something in the intent of this issue, let me know.

kennethshsu commented 2 years ago

Wow you guys are killing it!

@henrydingliu why don't you combine the premium data and the loss data into a single pd.DataFrame() and converting the object using columns in the cl.Triangle() object? Even if your premium doesn't develop, I always get the data into a Triangle object along with loss data.

henrydingliu commented 2 years ago

@kennethshsu yeah that's been my workaround too. then i got stuck trying to make countcl.ultimate * detrended_selected_sev work as sample_weight

kennethshsu commented 2 years ago

Can you post the whole code? Also, does it work with @jbogaardt's latest code?

henrydingliu commented 2 years ago

when i pip install, i seem to be installing an older master

kennethshsu commented 2 years ago

You would have to run a development environment with the latest commit, can't use pip or conda since those changes aren't released.

henrydingliu commented 2 years ago

f71f390 didn't fix the issue. However, I have plenty of efficient workarounds at this point. happy to close the issue.

For development_grain being monthly, it is impossible to know with a single valuation date whether the development grain should be 'M', 'Q', 'S' or 'A', we could perhaps do some inference from the raw column in the supplied pandas dataframe, but that can get tricky.

Actually, I found a way to tap into the existing development grain inference, via calling to_period(freq="Q") on the origin series.

df = pd.DataFrame()
df["DOL"] = pd.date_range(start="1/1/2008", periods=120, freq="M")
df["Dev"] = "2019-04-30"
df["Premium"] = 1000000
tri = cl.Triangle(
    data=df,
    origin="DOL",
    development="Dev",
    columns="Premium",
    cumulative=True,
)
print(tri.origin_grain)
print(tri.development_grain)
df["DOL_Q"] = df["DOL"].dt.to_period(freq="Q")
tri = cl.Triangle(
    data=df,
    origin="DOL_Q",
    development="Dev",
    columns="Premium",
    cumulative=True,
)
print(tri.origin_grain)
print(tri.development_grain)
jbogaardt commented 2 years ago

Thanks for the additional example. I think it should work with the BF method now on master.