Open lbelzile opened 1 year ago
The function also returns a single observation in the data
component of the list, even with r=-1
(but it does use all of the data for fitting...)
@byoungman Is there a chance we could get GAM modelling for the inhomogeneous Poisson point process in a future version of evgam
? This is oftentimes a more stable parametrization than the generalized Pareto model.
Hi. I tried to respond to this a few weeks ago by replying to an email. Did it arrive? If not, I'll respond here.
'I'm happy to look into this. I found it tricky working out how to robustly incorporate the integral in the inhomogeneous Poisson point process likelihood in a user-friendly way. Did you have anything in mind? I am also author of the ppgam package, by the way, which fits inhomogeneous Poisson point processes, and is perhaps one approach to handling the integral, and could be transferred to evgam.
Hum, it seems I had overlooked the question of the intensity measure for the nonstationary case (nobody ever mentions this challenge when recommending using the inhomogeneous Poisson process formulation...) Casson and Coles (1999) circumvented this altogether using a conditional independence assumption, but that just mean they ignored the problem...
What exactly is the domain we have to integrate over? The IPP intensity form without covariate is simply a function of $y$, but now this must be GEV for $y \mid \boldsymbol{x}$ so we need to approximate the integral over both $\boldsymbol{x}$ and $y$.
The case I have in mind is mostly for models with covariates, although space-time extensions could also be considered.
From what I gather from section 3.4.1 of your Environmetrics paper,
ppgam
uses quadrature methods. I wonder how this scale when $x$ is high-dimensional, but there are only a few exceedances. This would be however be an option, which has the advantage that it would probably be more straightforward to implement. One could always check the numerical accuracy in simple examples.My understanding is that the integral needs to be over all the covariate dimensions, which is approximated by quadrature in
ppgam
. You're right, though, that this typically won't scale well for high dimensions. However that's partly dependent on how smooth estimates are expected to be as this determines how many quadrature integration nodes are needed to give a good approximation.Do you have a model in mind that you'd like to fit? Perhaps if we can find a way to get that working fairly nicely, then some useful generalisations might become apparent. (I could perhaps even borrow some code from
ppgam
to save some time.)Cheers,
Ben.
Categorical variables are easiest to handle: we only need their frequency and we get the same intensity function conditional on them. How about starting with a simple model with a binary categorical and a continuous covariate? We could check the accuracy by Monte Carlo simulations.
Formula 5 in the JSS paper shows that the inhomogeneous Poisson point process likelihood coded in
evgam
is (essentially) that of the r-largest order statistics, but the documentation suggests we could use this for pure threshold exceedances (taking threshold $u$ rather than an order statistic, and keeping a single block of $r=n_u$ observations, where $n_u$ is the number of values that exceed the threshold.I am confused by the documentation.
In the code of the internal function
.setup.pp.data
, there is a subset argumentds <- split(data, data[, pp$id])
which fails if the listpp.args
does not containid
(which makes no sense if there was no covariate) and can return nonsense ifid
is a vector of numeric or integers (as it is possible to subset columns that don't exist).The code also returns errors with
threshold=0
ifr
is not provided, stating that there is no data...For what it's worth, I can mimic the inhomogeneous Poisson point process likelihood by creating a data set consisting only of exceedances with
r=-1
, a dummyid
with a single value.