How to use TimeGraphicalLasso on a dataset where you only have one observation for each time point

Dedalus9 commented 1 year ago

When I try to use the TimeGraphicalLasso function on a dataset where each time point only has one sample, the function returns only diagonal matrices.

According to the authors who came up with Time-Varying Graphical Lasso (https://dl.acm.org/doi/pdf/10.1145/3097983.3098037), a Time-Varying Graphical Lasso should be "able to estimate a network at a time where there is only one observation."

Is your implementation of this function not designed to handle the extreme case where there is only one observation for each time-point? If it isn't, then it would be nice if you were to adjust this function for this extreme case. If not, are there any recommendations you can give for how to get your implementation of the time-varying Graphical Lasso to work on a dataset where there is only one observation per time point?

Thank you!

fdtomasi commented 1 year ago

Hey! Thank you for the question. This is not a bug of the implementation. This is because the default parameters assume that each time-step needs to be centered, so each dimension is centered with respect to other samples belonging to the same time step before the computation by sklearn. As there are no other samples, the average is equal to the value to the dimension, hence the computed empirical covariance which is the actual input of [Latent]TimeVaryingGraphicalLasso is a zero matrix (here https://github.com/fdtomasi/regain/blob/aec1adbf7075dd1eb2713a08e3a1bda8be175d5c/regain/covariance/time_graphical_lasso_.py#L494)

This snippet is a minimal working example of a dataset with 1 sample at each time step and does not return diagonal matrices by specifying assume_centered=True.

import numpy as np

from regain.covariance import TimeGraphicalLasso
from regain.datasets import make_dataset

np.random.seed(42)
data = make_dataset(n_dim_lat=0, n_dim_obs=3, T=2, n_samples=1)
mdl = TimeGraphicalLasso(max_iter=2, assume_centered=True).fit(data.X, data.y)
mdl.precision_

Output:

array([[[ 188.15111411,  -75.69454836,  181.99571532],
        [ -75.69454836,  291.80183095,   39.39622736],
        [ 181.99571532,   39.39622736, 1056.93539952]],

       [[ 154.80677053,  -90.11168543,  111.06026013],
        [ -90.11168543,  419.93247224,   -1.85976849],
        [ 111.06026013,   -1.85976849,  719.22096281]]])

Dedalus9 commented 1 year ago

Thank you! Your solution worked

fdtomasi / regain

How to use TimeGraphicalLasso on a dataset where you only have one observation for each time point #42