casact / chainladder-python

Actuarial reserving in Python
https://chainladder-python.readthedocs.io/en/latest/
Mozilla Public License 2.0
186 stars 71 forks source link

[BUG] Unable to convert between grains when using pandas==2.2.0 #494

Closed 00milsg closed 5 months ago

00milsg commented 6 months ago

Describe the bug

When converting a triangle with grain OMDM to OQDQ every cell is converted to nan.

To Reproduce

To reproduce the behaviour I used the following package versions. Note: When I downgraded pandas to a version less than 2.2.0 the issue disappears. I will use this as a workaround for now.

pandas==2.2.0
numpy==1.26.4
chainladder==0.8.18

Full code to reproduce the behaviour is attached in a text file (it's 50 lines long and I don't want to bloat this post). I used the prism example dataset and back worked a dataset to create monthly & quarterly triangles. The code will run self contained. See link below.

If you'd rather I add the example code as a comment then let me know.

chain-ladder-grain-conversion-error-reprex.txt

Results of my digging into the issue

I think this is being caused by the initial dev_to_val conversion at the top of the method (possibly related to this issue?). This then flows through and upsets the calculation of the d_start variable. In the earlier versions the calculation of the d_start variable is correct. In version >= 2.2.0 d_start gets set to the start of the previous month. This triggers the conditional block below, which sets the data to nan.

https://github.com/casact/chainladder-python/blob/0a0d0cb9fb2db4743cf5efd8f6e34cdf5635b24b/chainladder/core/triangle.py#L689-L702

Question: Could we add a warning stating that this is what's happening (and potentially why)? We could also add an accompanying suggestion on how to correct it (filter your input data so that min(dev_month) >= min(orig_month)).

Expected behavior

When converting a triangle from grain OMDM to OQDQ the total sum of the triangle contents should remain the same. Please see below for a potential unit test.

# Test that triangles contain the same starting information
qtr_sum = qtr_triangle['reportedCount'].sum().sum()
mth_sum = mth_triangle['reportedCount'].sum().sum()
assert qtr_sum == mth_sum, "Triangles are not equal before grain change."

# Test that we still get the same answers when we convert the monthly tri to quarterly
mth_conv_sum = mth_triangle['reportedCount'].grain('OQDQ').sum().sum()
assert mth_conv_sum == mth_sum, "Triangles are not equal after grain change."
jbogaardt commented 6 months ago

@00milsg , thanks for reporting, providing a fully functioning reprex, and doing the legwork on the root cause. This is a fantastic bug report. I actually think its a bug in pandas==0.2.2 or at the very least an undocumented deprecation. I've opened a bug over there to confirm: https://github.com/pandas-dev/pandas/issues/57781

00milsg commented 6 months ago

No problem @jbogaardt, happy to help :)