X-DataInitiative / tick

Module for statistical learning, with a particular emphasis on time-dependent modelling
https://x-datainitiative.github.io/tick/
BSD 3-Clause "New" or "Revised" License
480 stars 105 forks source link

events data input format to fit function of HawkesExpKern #515

Open ibrahimelumari opened 1 year ago

ibrahimelumari commented 1 year ago

Hi, I've been trying to use the fit function for training using the HawkesExpKern function, but it seems that the data structure of my events list is wrong.

I'm working on some medical related data for a group of patients, so I have data for every patient, and each patient has some events that occur during a surgery. I have constructed the Hawkes process events timestamps list: that's a Python list containing sub-lists (also Python lists, one for every patient data) that contain 5 numpy.arrays each (each of these is indicating an event category, having their occurrence time in the surgery timeline (from 0 onward). Here's an extract from the list for the first 2 patients:

[
[array([   0,  150, 1440, 6960, 8040]), array([1140, 2160, 6480]), array([2190, 2640, 2970, 3210, 4380]), array([ 810, 1200, 1890]), array([ 780, 2100, 3510])],
 [array([   0,  150, 1500, 6660, 8100]), array([1170, 2040, 6300]), array([2070, 2520, 2850, 3090, 4200]), array([ 810, 1260, 1950]), array([ 780, 1980, 3390])],
....
]

The problem is that I keep getting an error when running the fit function (after specifying the kernel parameters): ValueError: Expecting a double numpy array

Am I doing something wrong ?

Mbompr commented 1 year ago

Hi, Have you tried casting your data into array of doubles ?

You could proceed as following

import numpy as np

list_of_list_of_events = [
[np.array([   0,  150, 1440, 6960, 8040]), np.array([1140, 2160, 6480]), np.array([2190, 2640, 2970, 3210, 4380]), np.array([ 810, 1200, 1890]), np.array([ 780, 2100, 3510])],
 [np.array([   0,  150, 1500])]]

casted_list_of_list_of_events = [
    [array.astype(float) for array in list_of_events]
    for list_of_events in list_of_list_of_events    
]

print(casted_list_of_list_of_events)

http://tpcg.io/_DQVARO

ibrahimelumari commented 1 year ago

Hi @Mbompr , Thank you for your response, I followed your suggestion and it magically worked ! Was I having that problem just because of me using int instead of float data type ? I'd like a simple explanation for this if possible. (since I don't think this was explicitly mentioned in the documentation, and that the error message I kept getting was not very expressive) ` fit(events: list, start=None) Fit the model according to the given training data.

Parameters
    events : list of np.array
        The events of each component of the Hawkes. Namely events[j] contains a one-dimensional numpy.array of the events’ timestamps of component j

... ` Additionally, it'd be so helpful to know, for the events parameter, what are the acceptable data types (would it be possible if we use just nested Python lists or nested np.array lists ) ?

Thank you.