FenTechSolutions / CausalDiscoveryToolbox

Package for causal inference in graphs and in the pairwise settings. Tools for graph structure recovery and dependencies are included.
https://fentechsolutions.github.io/CausalDiscoveryToolbox/html/index.html
MIT License
1.08k stars 198 forks source link

Is Graphviz needed when using R's pcalg? #48

Closed ArnoVel closed 4 years ago

ArnoVel commented 4 years ago

My question is related to these explanations about the pcalg package in R. I am currently installing the required packages to run CDT's graph-related PC.py. As the plotting is done using networkx, I figured Graphviz would not be needed.

Am I guessing correctly? Is there anything more I should know about these R requirements (say, about the path to the packages or some environment variables..)

Thanks, A.V

diviyank commented 4 years ago

Hi, I don't know if graphviz are in the requirements of the pcalg package. If not, we don't need it; the installation process of all packages can be quite a hassle.

The file /install-deps/install-dependencies.sh does includes the installation commands for debian based systems:

#!/bin/bash

apt-get update
DEBIAN_FRONTEND=noninteractive apt-get install -y tzdata
apt-get -q install r-base -y --allow-unauthenticated
apt-get -q install libssl-dev -y
apt-get -q install libgmp3-dev  -y --allow-unauthenticated
apt-get -q install git -y
apt-get -q install build-essential  -y --allow-unauthenticated
apt-get -q install libv8-3.14-dev  -y --allow-unauthenticated
apt-get -q install libcurl4-openssl-dev -y --allow-unauthenticated
Rscript -e 'install.packages(c("V8","sfsmisc","clue","randomForest","lattice","devtools","MASS"),repos="http://cran.us.r-project.org")'
Rscript -e 'source("http://bioconductor.org/biocLite.R"); biocLite(c("CAM", "SID", "bnlearn", "pcalg", "kpcalg", "D2C"))'
Rscript -e 'library(devtools); install_github("cran/momentchi2"); install_github("Diviyan-Kalainathan/RCIT")'
Rscript -e 'install.packages(c("sparsebn"),repos="http://cran.us.r-project.org")'

An easy alternative would be to use the docker images. Best regards, Diviyan

ArnoVel commented 4 years ago

Hi, Sadly, as I was manually installing the requirements (which you list here: c("CAM", "SID", "bnlearn", "pcalg", "kpcalg", "D2C") ), the following situation appeared:

Warning messages:
1: package ‘Rgraphviz’ is not available (for R version 3.6.1) 
2: In install.packages(c("randomForest", "gRbase", "lazy", "infotheo",  :
  installation of package ‘gRbase’ had non-zero exit status
> install.packages(pkgs=pkf, type="source", repos=NULL)
Installing package into ‘/usr/local/lib/R/site-library’
(as ‘lib’ is unspecified)
ERROR: dependencies ‘gRbase’, ‘Rgraphviz’ are not available for package ‘D2C’
* removing ‘/usr/local/lib/R/site-library/D2C’

Mind you, I have R 3.6 because R 3.4.4 did not have mvtnorn which was needed for another dependency.

Will try with BiocManager (biocLite doesn't work with R>=3.5 I think)

Thanks for the help, will update if I have additional problems

Update: I do have a debian-based system (ubuntu 18.04 LTS) , but mvtnorm implies R >=3.5 , and apparently D2C is archived and depends on Rgraphviz. BiocManager can't find D2C as it is archived, same for install.packages()

> BiocManager::install("D2C")
Bioconductor version 3.9 (BiocManager 1.30.4), R 3.6
Installing package(s) 'D2C'
Warning message:
package ‘D2C’ is not available (for R version 3.6.1)

Final question: If my only goal is to run the PC algorithm on Sachs with different versions (heuristics) of KCI-test, would I need to have D2C and therefore Rgraphviz?

Thank you, I will continue the installation process tomorrow, Regards, A.V

diviyank commented 4 years ago

Hi, D2C is not required at the moment; I planned to add the algorithm, so I added the requirement in advance, but it isn't used at the moment.

So maybe skip D2C. The R requirements are managed independently for each algorithm so no issues there. To run PC, you only need to have pcalg, kpcalg and RCIT:

and install RCIT from my fork repo, it contains an adaptation of the author's code to make it work with CDT

from cdt.causality.graph.PC:

    def __init__(self, CItest="gaussian", method_indep='corr', alpha=0.01,
                 njobs=None, verbose=None):
        """Init the model and its available arguments."""
        if not (RPackages.pcalg and RPackages.kpcalg and RPackages.RCIT):
            raise ImportError("R Package (k)pcalg/RCIT is not available. "
                              "RCIT has to be installed from "
                              "https://github.com/Diviyan-Kalainathan/RCIT")

I should add in the documentation the required R packages for each algorithm.

Best, Diviyan

ArnoVel commented 4 years ago

Back to work, was delighted to find this reply! This will save me tons of time, thanks a lot :)

diviyank commented 4 years ago

No problem :) ! I will improve the documentation in the next version

diviyank commented 4 years ago

It should be done ! I will close this issue, don't hesitate to reopen it if an issue arises.