cdt15 / lingam

Python package for causal discovery based on LiNGAM.
https://sites.google.com/view/sshimizu06/lingam
MIT License
372 stars 56 forks source link

Normalization in ICA-LiNGAM #126

Closed kohashi1999 closed 8 months ago

kohashi1999 commented 8 months ago

Hi,

When ICA-LiNGAM was run with normalized data, it sometimes outputs DAGs with a different structure than the data that was not normalized. For comparison, the same was done with Direct-LiNGAM, but the structure was the same, although the edge weights were different. Is this a problem specific to ICA-LiNGAM?

The data sets that output results with different structures were created with the following code.

import numpy as np
import pandas as pd

adj_matrix = np.array(
    [
        [0, 0, 0, 0, 0],
        [1, 0, 0, 0, 0],
        [10, 1, 0, 0, 0],
        [100, 10, 1, 0, 0],
        [1000, 100, 10, 1, 0],
    ]
)
n_features = adj_matrix.shape[1]
rng = np.random.default_rng(seed=0)
E = rng.uniform(low=-1, high=1, size=(n_features, 10000))
I = np.identity(n_features)
X = np.matmul(np.linalg.inv(I - adj_matrix), E)
df = pd.DataFrame(X.T)
# df = df.sub(df.mean()).div(df.std(ddof=0))  # nomalization

Thank you.

sshimizu2006 commented 8 months ago

Yes, different scalings could give different results in ICA-LiNGAM, but not in DirectLiNGAM. In particular, the permutation algorithms used in ICA-LiNGAM can be affected by the scales of variables. However, the results would be the same for enough large sample sizes.

kohashi1999 commented 8 months ago

Thanks for your reply. I confirmed that ICA-LiNGAM outputs DAGs with a same structure when the sample size is set to 7 million.