Issues with parameter inference using likelihood as a goodness-of-fit measure for HawkesExpKern

oresthes commented 9 months ago

Hi!

I am using HawkesExpKern to infer parameters on a simulated process with known parameters. It is able to work ok(*) with least-squares as a goodness-of-fit measure but it struggles with likelihood. It errors out under most solvers and with svrg it fails to converge.

To replicate the process.

1) Simulate data

adjacency = np.array([[0.8]]) 
decays = np.array([[0.025]]) 
baseline = np.array([0.01])
run_time = 2922

hawkes_simulation_univariate = SimuHawkesExpKernels(
    adjacency = adjacency,
    decays = decays,
    baseline = baseline,
    end_time = run_time,
    max_jumps = None,
    verbose=True,
    seed=117,
    force_simulation=False
    )

2) Infer using HawkesExpKern

sample_hawkes_learner_loglik = HawkesExpKern(
    decays = hawkes_simulation_univariate.decays[0][0],
    gofit = 'likelihood',
    solver='svrg',
    step=None,
    tol=1e-05,
    max_iter=10000,
    verbose=True,
    print_every=50,
    record_every=50
)

3) Fit model

sample_hawkes_learner_ls.fit(hawkes_simulation_univariate.timestamps)

The response I am getting

SVRG step needs to be tuned manually

Launching the solver SVRG...
  n_iter  |    obj    |  rel_obj 
    10000 |       nan |       nan
Done solving using SVRG in 0.5193345546722412 seconds
<tick.hawkes.inference.hawkes_expkern_fixeddecay.HawkesExpKern at 0x7f17dd4fce80>

If I repeat the same process using AGD as a solver it errors out as follows

---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
Cell In[12], line 1
----> 1 sample_hawkes_learner_loglik.fit(sample_simulation.timestamps)

File [/.venv/lib/python3.8/site-packages/tick/hawkes/inference/base/learner_hawkes_param.py:210](https://file+.vscode-resource.vscode-cdn.net//.venv/lib/python3.8/site-packages/tick/hawkes/inference/base/learner_hawkes_param.py:210), in LearnerHawkesParametric.fit(self, events, start)
    207     coeffs_start = np.ones(model_obj.n_coeffs)
    209 # Launch the solver
--> 210 coeffs = solver_obj.solve(coeffs_start)
    212 # Get the learned coefficients
    213 self._set("coeffs", coeffs)

File [/.venv/lib/python3.8/site-packages/tick/solver/base/first_order.py:283](https://file+.vscode-resource.vscode-cdn.net//.venv/lib/python3.8/site-packages/tick/solver/base/first_order.py:283), in SolverFirstOrder.solve(self, x0, step)
    280 if self.prox is None:
    281     raise ValueError('You must first set the prox using '
    282                      '``set_prox``.')
--> 283 solution = Solver.solve(self, x0, step)
    284 return solution

File [/.venv/lib/python3.8/site-packages/tick/solver/base/solver.py:109](https://file+.vscode-resource.vscode-cdn.net//.venv/lib/python3.8/site-packages/tick/solver/base/solver.py:109), in Solver.solve(self, *args, **kwargs)
    107 def solve(self, *args, **kwargs):
    108     self._start_solve()
--> 109     self._solve(*args, **kwargs)
    110     self._end_solve()
    111     return self.solution
...
    120     r"""loss(Model self, ArrayDouble const & coeffs) -> double"""
--> 121     return _hawkes_model.Model_loss(self, coeffs)

RuntimeError: The sum of the influence on someone cannot be negative. Maybe did you forget to add a positive constraint to your proximal operator

What makes it even stranger is that I can find the maximum through brute force. This is the plot of the likelihood function (using the score method of the class). It converges a bit further away from the simulation parameters but it does exist.

likelihood

oresthes commented 8 months ago

I must add that is seems to be an issue with the solver (or what is being passed to it) since the score method appears to be working correctly for the likelihood based learner

Mbompr commented 8 months ago

Hello @oresthes,

Actually optimizing for the llh of Hawkes processes with gradient descent is very hard due to the shape of the optimization curve (very flat near the optimum, very picky near the boundaries. Hence, the classical optimization algorithms (AGD, SVRG etc.) that rely on the gradient Lipschitz assumption have high chances to fail (see https://arxiv.org/abs/1807.03545).

You can try several hacks to make it work:

Fit with least squares and use the obtained point as a starting point
Use positive=True in your penalty
Try smaller step sizes

X-DataInitiative / tick

Issues with parameter inference using likelihood as a goodness-of-fit measure for HawkesExpKern #518