FenTechSolutions / CausalDiscoveryToolbox

Package for causal inference in graphs and in the pairwise settings. Tools for graph structure recovery and dependencies are included.
https://fentechsolutions.github.io/CausalDiscoveryToolbox/html/index.html
MIT License
1.12k stars 197 forks source link

[BUG] cdt.causality.graph.PC, using PC with most of the allowed values for CITest argument breaks #97

Open Black-Swan-ICL opened 3 years ago

Black-Swan-ICL commented 3 years ago

Describe the bug

Running the PC algorithm with CITest='gaussian' works. However, the code crashes when using CITest='hsic_gamma'. The exact same problem occurs when using CITest='hsic_perm', CITest='rcit', CITest='rcot', or CITest='hsic_clust'. I did not try options 'binary' and 'discrete', as it would not make sense considering the data I use (continuous data).

Please find below the details.

My configuration

Detailed hardware and distribution :

System:

CPU:

Graphics:

Data I use

I am using a toy dataset generated from a structural causal model, 'test_samples.csv', enclosed test_samples.csv.

For full details on the structural causal model used, it is defined as:

X_0 := U_0 X_1 := U_1 X_2 := U_2 X_3 := 2 X_0 + 3 X_1 + U_3 X_4 := 4 X_1 + U_4 X_5 := 5 X_4 + 7 X_6 + U_5 X_6 := 6 X_2 + U_6,

where the U_i 's are the independent exogenous variables.

The exact same problem occurs if I use the dataset 'sachs' from the cdt package, for instance.

Traceback

/home/blackswan/.pyenv/versions/venv_cdt_py3-7/bin/python /home/blackswan/pythonprojects/causality/cdt_test.py
Detecting 1 CUDA device(s).

R Python Error Output
-----------------------

[Errno 2] No such file or directory: '/tmp/cdt_pc_7bd64676-0b9d-46ad-be3d-f0db66f0eb73/result.csv'
Traceback (most recent call last):
  File "/home/blackswan/pythonprojects/causality/cdt_test.py", line 24, in <module>
    output2 = pc2.predict(data)
  File "/home/blackswan/.pyenv/versions/venv_cdt_py3-7/lib/python3.7/site-packages/cdt/causality/graph/model.py", line 63, in predict
    return self.create_graph_from_data(df_data, **kwargs)
  File "/home/blackswan/.pyenv/versions/venv_cdt_py3-7/lib/python3.7/site-packages/cdt/causality/graph/PC.py", line 278, in create_graph_from_data
    results = self._run_pc(data, verbose=self.verbose)
  File "/home/blackswan/.pyenv/versions/venv_cdt_py3-7/lib/python3.7/site-packages/cdt/causality/graph/PC.py", line 315, in _run_pc
    raise e
  File "/home/blackswan/.pyenv/versions/venv_cdt_py3-7/lib/python3.7/site-packages/cdt/causality/graph/PC.py", line 311, in _run_pc
    self.arguments, output_function=retrieve_result, verbose=verbose)
  File "/home/blackswan/.pyenv/versions/venv_cdt_py3-7/lib/python3.7/site-packages/cdt/utils/R.py", line 221, in launch_R_script
    raise RuntimeError("RProcessError \nR Process Error Output \n-----------------------\n" + str(err, "ISO-8859-1")) from None
RuntimeError: RProcessError
R Process Error Output
-----------------------
Loading required package: momentchi2
Loading required package: MASS
Error in skeleton(suffStat, indepTest, alpha, labels = labels, method = skel.method,  :
  Evaluation error: missing value where TRUE/FALSE needed.
Calls: runPC -> <Anonymous> -> skeleton
Execution halted

Process finished with exit code 1

Code snippet


import networkx as nx
import pandas as pd
import matplotlib.pyplot as plt

import cdt

if __name__ == '__main__':

    cdt.SETTINGS.rpath = '/usr/bin/Rscript'

    data = pd.read_csv('test_samples.csv')

    # Works
    pc1 = cdt.causality.graph.PC(CItest='gaussian')
    output1 = pc1.predict(data)
    nx.draw_networkx(output1)
    plt.show()

    # Doesn't work
    pc2 = cdt.causality.graph.PC(CItest='hsic_gamma')
    output2 = pc2.predict(data)
    nx.draw_networkx(output2)
    plt.show()

Running with 'CItest='gaussian' works and a graph is produced, enclosed PC_with_gaussian.

Running with 'hsic_gamma' produces the error trace shown above.

Eighonet commented 7 months ago

@Diviyan-Kalainathan Any updates? Faced this problem working with another dataset, the situation is exactly the same.