Closed Jeanieiszo1 closed 6 years ago
Hi Jean-Claude,
thank you for you for you interest in the tick library!
You're probably seeing “discontinuities” in the loglog plot. The reason is that your kernels take on negative values, thus the log is nan and matplotlib will not plot these point in a log-log scale. Indeed, the non parametric estimation procedure that use the conditional law can find negative-valued kernels. This is actually a powerful feature of the algorithm as it allows to model (moderate) inhibition effects.
The mathematical reason we can find negative kernels is as follows. For simplicity, let’s stick to 1D case. The first step of the procedure is to empirically estimate the conditional law
where N is the counting process and
and Lambda is the average intensity of the process.
Now, intuitively, if some inhibition effect is present for some t, we will have that
i.e. you expect less event than average, and thus g(t) < 0 for some t. It has been proved in [1] that this also imply that the kernel will take negative values for some t. See also [3] for all the details about the estimation method.
In practice with real data, one can find negative values of g(t) for 2 main reasons:
There are genuine inhibition effects in the data. This is for example the case for the negative relation found in [2] between P+ and P+ events as we know the mid-price is mean reverting and thus another P+ event is more unlikely just after a P+ event has taken place.
Too fine grid in estimation of g(t). Real data come with a finite time resolution and furthermore we don’t have an infinite amount of data. Estimation of g(t) basically is like estimation of a conditional probability density. You divide your domain in discrete bins and count how many event you have in each bin. If you take very small bins compared to data resolution/amount, you run into the risk that many of them will be empty or scarcely populated so E[dN_t|dN_0] < Lambda and then you gent negative kernels.
To give you a concrete example, suppose your data come with 1s time resolution. This means that in your data no two timestamps will have a time distance below 1s. Now, if we estimate g(t) on a grid with millisecond bins, all the bins between 0 and 1s will be empty and thus g(t) will be negative on this domain.
This doesn’t mean that the actual kernels have gaps, it’s just that they take negative values. Here the cause it’s most likely point 2 above.
I would suggest the following:
as you see there is a lot of structure up to about 1e-3 s so we might want the linear grid to go up to there (min_lag = 1e-3). For this data I then took delta_lag between 1/50 and 1/10 depending on how many components the Hawkes process we want to fit has. The lesser the dimension the more fine the grid (smaller delta_lag) could be. Also I set max_lag to 250 - 1000.
Remember that the grid used for estimation of g(t) is given by:
0, min_lag delta_lag, min_lag 2 delta_lag, … min_lag, min_lag exp(delta_lag), min_lag exp(2 delta_lag) …, max_lag
Thus a first rule of thumb would be to check that delta_lag * min_lag is large enough so that you have a sufficient number of inter-event times in each bin. In future versions of tick we are going to allow the user to pass directly the grid, so that it can be better adapted to his particular data.
Try first with a 1D process as it will be much faster and you’ll have less noise and then as you increase the dimension you adjust the parameters.
Regarding the compute parameters i.e. n_quad, min_support, and max_support, these will determine the points at which the kernel values will be computed by discretising the Wiener-Hopf system. When using the ‘log’ scheme the grid is given by
0, min_support eta, min_support 2 eta, … min_support, min_support exp(eta), min_support exp(2 eta), … max_support
with
eta = ( log(max_support) - log(min_support) + 1) / n_quad
For the above data I used n_quad=120, max_support=1, min_support=5e-5. Note in particular that if you set a very large value for max_support (like hundreds of seconds for high frequency financial data) you will invariably have a lot of noise in the tail of the kernels which can become a problem and will lead to instable values for the norm. I suggest starting with smaller values like 1 second or less for high-frequency data. Try also to lower the number of quadrature when you increase the dimension (like 80-100 for 8D).
for real in realizations:
hawkes_learner.incremental_fit(real , compute=False)
hawkes_learner.compute()
so that the code will correctly estimate g(t) by averaging over all realisations and compute the kernels just once at the end.
to plot kernel i,j:
plt.plot(hawkes_learner.kernels[i][j][0], hawkes_learner.kernels[i][j][1])
plt.semilogx()
plt.xlabel('t')
plt.ylabel('Kernel value')
to plot conditional law i,j
plt.plot(hawkes_learner._claw_X, hawkes_learner._claw1[i][j])
plt.semilogx()
plt.xlabel('t')
plt.ylabel(‘Claw value')
[1] Rambaldi M, Bacry E, Lillo F. The role of volume in order book dynamics: a multivariate Hawkes process analysis. Quantitative Finance. 2017; 17(7):999-1020. [2] Bacry E, Jaisson T, Muzy JF. Estimation of slowly decreasing Hawkes kernels: application to high-frequency order book dynamics. Quantitative Finance. 2016; 16(8):1179-201. [3] Bacry E, Muzy JF. First-and second-order statistics characterization of Hawkes processes and non-parametric estimation. IEEE Transactions on Information Theory. 2016; 62(4):2184-202.
Hi,
For a study project, I am fitting high frequency order book data to a Hawkes model in the same manner as Bacry(2015) and encountering the following problem: Using the non parametric estimation based on conditional law gives discountinuous functions (like in the example 'Fit Hawkes power law kernels').
For my project, I need to compute intensities and this requires the kernels to be evaluated at certain points which is now impossible. So my question is: what gives these NaN's? Is it because the conditional laws already contain NaN's? Or is it caused by a quirk in the quadrature method? I tried to find the cause in the source code, but got a bit lost. My last question would be: Can it easily be solved by some kind of interpolation?
Any help would be greatly appreciated! :) Thanks, Jean-Claude