MHaringa / insurancerating

R-package for actuarial pricing
https://mharinga.github.io/insurancerating
68 stars 17 forks source link

Bug in Univariate #6

Closed ArendJansen closed 10 months ago

ArendJansen commented 10 months ago

bug_report_univariate.txt

The univariate function in version 0.7.2 of insurancerating gives a datatable error when one of the categories of function argument column 'x' has 0 total 'nclaims' or 0 total 'severity'.

I've attached a text file with a code example.

MHaringa commented 10 months ago

Hi ArendJansen, I believe there might be a mistake in the invocation of insurancerating::univariate() in your example. The first argument is duplicated here. The following code doesn't produce an error message. Best, Martin

library(insurancerating)

dataset <- insurancerating::MTPL2 |> 
  dplyr::mutate(nclaims = ifelse(area == 0, 0, nclaims),
                amount = ifelse(area == 0, 0, amount))

insurancerating::univariate(
    dataset,
    x = area, 
    severity = amount, 
    nclaims = nclaims,
    exposure = exposure, 
    premium = premium) 
#>    area  amount nclaims   exposure premium frequency average_severity
#> 1:    2 4063270      98  818.53973   51896 0.1197254         41461.94
#> 2:    3 7945311     113  764.99178   49337 0.1477140         70312.49
#> 3:    1 6896187     146 1065.74795   65753 0.1369930         47234.16
#> 4:    0       0       0   13.30685     902 0.0000000              NaN
#>    risk_premium loss_ratio average_premium
#> 1:     4964.047    78.2964        63.40071
#> 2:    10386.139   161.0416        64.49350
#> 3:     6470.749   104.8802        61.69658
#> 4:        0.000     0.0000        67.78464

Created on 2023-11-29 with reprex v2.0.2

ArendJansen commented 10 months ago

I see you're right, my bad. When using the autoplot function directly behind this code.

library(insurancerating)

dataset <- insurancerating::MTPL2 |> 
  dplyr::mutate(nclaims = ifelse(area == 0, 0, nclaims),
                amount = ifelse(area == 0, 0, amount))

dataset |> 
insurancerating::univariate(
  x = area, 
  severity = amount, 
  nclaims = nclaims,
  exposure = exposure, 
  premium = premium) |> 
  insurancerating::autoplot()

I do get an error, because the average_severity of one category is now equal to NaN

MHaringa commented 10 months ago

Thanks for catching that. autoplot.univariate() now generates a plot even when there are missing values in the rows. This enhancement is included in the development version of the package and will be part of version 0.7.3.

library(insurancerating)

dataset <- insurancerating::MTPL2 |> 
  dplyr::mutate(nclaims = ifelse(area == 0, 0, nclaims),
                amount = ifelse(area == 0, 0, amount))

x <- dataset |> 
  insurancerating::univariate(
    x = area, 
    severity = amount, 
    nclaims = nclaims,
    exposure = exposure, 
    premium = premium) 

x
#>    area  amount nclaims   exposure premium frequency average_severity
#> 1:    2 4063270      98  818.53973   51896 0.1197254         41461.94
#> 2:    3 7945311     113  764.99178   49337 0.1477140         70312.49
#> 3:    1 6896187     146 1065.74795   65753 0.1369930         47234.16
#> 4:    0       0       0   13.30685     902 0.0000000              NaN
#>    risk_premium loss_ratio average_premium
#> 1:     4964.047    78.2964        63.40071
#> 2:    10386.139   161.0416        64.49350
#> 3:     6470.749   104.8802        61.69658
#> 4:        0.000     0.0000        67.78464

insurancerating::autoplot(x)
#> Warning: Removed 1 rows containing missing values (`geom_point()`).
#> Warning: Removed 1 row containing missing values (`geom_line()`).

Created on 2023-12-16 with reprex v2.0.2