X-DataInitiative / tick

Module for statistical learning, with a particular emphasis on time-dependent modelling
https://x-datainitiative.github.io/tick/
BSD 3-Clause "New" or "Revised" License
484 stars 105 forks source link

Help needed for HawkesEM #430

Closed charlottedion closed 4 years ago

charlottedion commented 4 years ago

Hello! Thank you for the great library! I am trying to use the HawkesEM method on a network of 249 subjects. Do you think the method can make it? (on my computer it is sooo long). My main concern is the visualisation of the result (I can't represent all the interaction functions). Is it possible to do a colored matrix as it is done on the financial example ? or have you an other idea ? (my goal here is to find the main connexions between the 249 subjects, reducing the dimension around 10 if possible). Thank you for your help, Charlotte

Mbompr commented 4 years ago

Hello @charlottedion, No I don't think you could scale to such high dimensions with HawkesEM. The complexity would be too high (you would need to estimate 249 x 249 x kernel_size parameters, so around 700K). I see only three methods to compute Hawkes in such high dimensions

Any Hawkes process fitted with one of these methods could be passed to the function plot_hawkes_kernel_norms that will plot the "colored matrix" containing the kernel norms.

charlottedion commented 4 years ago

thank you @Mbompr :) But in HawkesSumExpKern I still don't get what I have to put in "decays" ? (a vector of size 249 with random values ?)

Mbompr commented 4 years ago

In decays you definitely do not want a vector of size 249. You would rather like to put the set of decays that will then constitute the kernel basis.

Namely, if you have three decay values beta_1 = 0.3, beta_2 = 1, beta_3 = 3, then for each kernel i, j you will learn alpha_1^(ij) corresponding to beta_1, alpha_2^(ij) corresponding to beta_2 and alpha_3^(ij) corresponding to beta_3.

Here is a small script working on the finance dataset that tries several combination of decays and keep only the best one.

import numpy as np
import itertools

from tick.dataset import fetch_hawkes_bund_data
from tick.hawkes import HawkesSumExpKern
from tick.plot import plot_hawkes_kernel_norms

timestamps_list = fetch_hawkes_bund_data()

best_score = -1e100
decay_candidates = np.logspace(0, 6, 6)
for i, decays in enumerate(itertools.combinations(decay_candidates, 3)):
    decays = np.array(decays) # Each time we test a different set of 3 decays.
    hawkes_learner = HawkesSumExpKern(decays, verbose=False, max_iter=10000,
                                      tol=1e-10)
    hawkes_learner._prox_obj.positive = False
    hawkes_learner.fit(timestamps_list)

    hawkes_score = hawkes_learner.score()
    if hawkes_score > best_score:
        print('obtained {}\n with {}\n'
              .format(hawkes_score, decays))
        best_hawkes = hawkes_learner
        best_score = hawkes_score

plot_hawkes_kernel_norms(best_hawkes, show=True)
charlottedion commented 4 years ago

Thanks a lot, now I get it and it works ! Have a nice day, Charlotte

Mbompr commented 4 years ago

Cool ! Don't hesitate to keep us posted. Have a nice day !