Closed johcarter closed 12 months ago
The mathematical model for partitioning the variance is a random effects model which requires the hazard and vuln factors to be random. However in the Oasis framework, the hazard element is fixed not random (events occurrences assigned to years in a fixed timeline). This means that the hazard element of the AAL variance using the formula in the attached paper does not reduce with increasing samples and does not accurately predict the overall variance in the AAL estimate, which does reduce in proportion to the number of samples under CLT.
The random effects model described in anova_technique_methodolgy may be suitable for other cat loss modelling calculation frameworks where the hazard element is also random so it is attached here for future reference.
thank you to Radek @OasisLMF/impactforecasting for getting to the bottom of this.
In terms of the convergence report I propose dropping the ANOVA fields and estimating the standard error of the AAL using the standard deviation calculated from all annual loss samples. This is s / sqrt( IM) where s is the sample standard deviation of the annual losses for i = 1,2 ... IM (I being the total number of periods and M being the number of samples). Updated proposed reports attached.
ORD_convergence_tables_v6.xlsx
anova_technique_methodology_v1.pdf
FYI @hchagani-oasislmf
drop anova fields from output report
Issue Description
Advice from stats gurus would be very welcome on this problem.
The standard error of the AAL estimate in the new report seems to overstate the observed sampling error for a given sample size. Using PiWind, 10 locations, a bootstrap of AAL calculated 100 times with 10 samples produces a standard deviation of 0.6%, versus estimated standard error of 7.8%. While this is great news for the user, it means the CALT (Convergence in Average Loss Table) report is pretty useless as a predictive tool for AAL convergence.
I think the issue is the violation of the i.i.d assumption, in particular the identically distributed assumption. Each year loss observation comes from a particular period that has particular events which have different loss variation. The bigger the event, the bigger the variation in loss. At the other end of the spectrum, we have 2/3 of periods with no events and zero loss variation. This represents a case of extreme heteroscedasticity.
With a bit of googling I have found some methods that correct for model misspecification / iid violation.
https://stat-analysis.netlify.app/the-iid-violation-and-robust-standard-errors.html
Further investigation is needed to improve the estimated standard error and make this report useful.
Steps to Reproduce (Bugs only)
Version / Environment information
1.26
Example data / logs