FenTechSolutions / CausalDiscoveryToolbox

Package for causal inference in graphs and in the pairwise settings. Tools for graph structure recovery and dependencies are included.
https://fentechsolutions.github.io/CausalDiscoveryToolbox/html/index.html
MIT License
1.08k stars 198 forks source link

[BUG] CAM score=linear #79

Open manonromain opened 3 years ago

manonromain commented 3 years ago

Hi!

I'm working with CAM and the linear score which I need doesn't work... Running:

from cdt.causality.graph import CAM
X = np.random.random((1000, 30))
df = pd.DataFrame(X)
obj = CAM(score="linear")
output = obj.predict(df)

produces the error:

R Process Error Output 
-----------------------
Le chargement a nécessité le package : glmnet
Le chargement a nécessité le package : Matrix
Loaded glmnet 4.0-2
Le chargement a nécessité le package : mboost
Le chargement a nécessité le package : parallel
Le chargement a nécessité le package : stabs
This is mboost 2.9-3. See ‘package?mboost’ and ‘news(package  = "mboost")’
for a complete list of changes.

Attachement du package : ‘mboost’

The following object is masked from ‘package:glmnet’:

    Cindex

Le chargement a nécessité le package : mgcv
Le chargement a nécessité le package : nlme
This is mgcv 1.8-33. For overview type 'help("mgcv-package")'.
Warning message:
replacing previous import ‘glmnet::Cindex’ by ‘mboost::Cindex’ when loading ‘CAM’ 
Error in scoreUpdate - scoreNodes[j] : 
  argument non numérique pour un opérateur binaire
Calls: CAM -> updateScoreMat
De plus : Warning message:
In mclapply(seq_len(n), do_one, mc.preschedule = mc.preschedule,  :
  all scheduled cores encountered errors in user code
Exécution arrêtée

Leaving the default score ie "non-linear" works tho.

I'm working with cdt version 0.5.21 and Python 3.8.2.

Thanks for your help!

Otherwise, the package is great, thank you very much!!

diviyank commented 3 years ago

This is strange ; It might directly come from the CAM package. Can you try running the same algorithm on a sample dataset (cdt.data.load_dataset('sachs') for example? ) ?

Thanks a lot, Diviyan

manonromain commented 3 years ago

I get exactly the same error. Maybe it is my installation but the default score works fine.

So the following code:

from cdt.data import load_dataset
data, graph = load_dataset("sachs")
obj = CAM(score="linear")
output = obj.predict(data)

gets the following error:

R Python Error Output 

[Errno 2] No such file or directory: '/var/folders/qd/mw9xvjtx7m7grvv27vxndl2r0000gn/T/cdt_cam_b793ff6b-6b9a-43c7-ae60-3d662bc8f15a/result.csv'

RuntimeError                              Traceback (most recent call last)
<ipython-input-60-fbc4819c5991> in <module>
      3 data, graph = load_dataset("sachs")
      4 obj = CAM(score="linear")
----> 5 output = obj.predict(data)

/opt/anaconda3/lib/python3.8/site-packages/cdt/causality/graph/model.py in predict(self, df_data, graph, **kwargs)
     61         """
     62         if graph is None:
---> 63             return self.create_graph_from_data(df_data, **kwargs)
     64         elif isinstance(graph, nx.DiGraph):
     65             return self.orient_directed_graph(df_data, graph, **kwargs)

/opt/anaconda3/lib/python3.8/site-packages/cdt/causality/graph/CAM.py in create_graph_from_data(self, data, **kwargs)
    180         self.arguments['{NJOBS}'] = str(self.njobs)
    181         self.arguments['{VERBOSE}'] = str(self.verbose).upper()
--> 182         results = self._run_cam(data, verbose=self.verbose)
    183 
    184         return nx.relabel_nodes(nx.DiGraph(results),

/opt/anaconda3/lib/python3.8/site-packages/cdt/causality/graph/CAM.py in _run_cam(self, data, fixedGaps, verbose)
    204         except Exception as e:
    205             rmtree(run_dir)
--> 206             raise e
    207         except KeyboardInterrupt:
    208             rmtree(run_dir)

/opt/anaconda3/lib/python3.8/site-packages/cdt/causality/graph/CAM.py in _run_cam(self, data, fixedGaps, verbose)
    199         try:
    200             data.to_csv(Path('{}/data.csv'.format(run_dir)), header=False, index=False)
--> 201             cam_result = launch_R_script(Path("{}/R_templates/cam.R".format(os.path.dirname(os.path.realpath(__file__)))),
    202                                          self.arguments, output_function=retrieve_result, verbose=verbose)
    203         # Cleanup

/opt/anaconda3/lib/python3.8/site-packages/cdt/utils/R.py in launch_R_script(template, arguments, output_function, verbose, debug)
    219                 print("\nR Python Error Output \n-----------------------\n")
    220                 print(e)
--> 221                 raise RuntimeError("RProcessError \nR Process Error Output \n-----------------------\n" + str(err, "ISO-8859-1")) from None
    222             print("\nR Python Error Output \n-----------------------\n")
    223             print(e)

RuntimeError: RProcessError 
R Process Error Output 

Le chargement a nécessité le package : glmnet
Le chargement a nécessité le package : Matrix
Loaded glmnet 4.0-2
Le chargement a nécessité le package : mboost
Le chargement a nécessité le package : parallel
Le chargement a nécessité le package : stabs
This is mboost 2.9-3. See ‘package?mboost’ and ‘news(package  = "mboost")’
for a complete list of changes.

Attachement du package : ‘mboost’

The following object is masked from ‘package:glmnet’:
       Cindex
Le chargement a nécessité le package : mgcv
Le chargement a nécessité le package : nlme
This is mgcv 1.8-33. For overview type 'help("mgcv-package")'.
Warning message:
replacing previous import ‘glmnet::Cindex’ by ‘mboost::Cindex’ when loading ‘CAM’ 
Error in scoreUpdate - scoreNodes[j] : 
  argument non numérique pour un opérateur binaire
Calls: CAM -> updateScoreMat
De plus : Warning message:
In mclapply(seq_len(n), do_one, mc.preschedule = mc.preschedule,  :
  all scheduled cores encountered errors in user code
Exécution arrêtée

Do you have any idea?

Thanks!

diviyank commented 3 years ago

Hello, I think it comes from the CAM R package... This might take some time to fix ; as it will come with the recoding of all the R packages into python.

Sorry I don't have any clear answer for this... Best,

diviyank commented 3 years ago

With the new package version I stumbled across a new error:

R Process Error Output 
-----------------------
Loading required package: glmnet
Loading required package: Matrix
Loaded glmnet 4.0-2
Loading required package: mboost
Loading required package: parallel
Loading required package: stabs
This is mboost 2.9-2.1. See â and â
for a complete list of changes.

Attaching package: â

The following object is masked from â:

    Cindex

Loading required package: mgcv
Loading required package: nlme
This is mgcv 1.8-31. For overview type 'help("mgcv-package")'.
Warning message:
replacing previous import â by â when loading â 
Error in model.frame.default(formula = y ~ X, drop.unused.levels = TRUE) : 
  invalid type (list) for variable 'X'
Calls: CAM ... lm -> eval -> eval -> <Anonymous> -> model.frame.default
Execution halted