JanMarcoRuizdeVargas / clustercausal

The Repository supporting my Master's Thesis at TUM.
GNU Affero General Public License v3.0
1 stars 0 forks source link

Cluster DAGs as background knowledge for causal discovery

This repository was developed as part of my master thesis at TUM.

I use the novel C-DAG (https://arxiv.org/abs/2202.12263) framework as background knowledge for causal discovery.

Find my thesis at https://sharelatex.tum.de/read/npjkjggtqffh . Find the data used for the simulation studies at https://drive.google.com/drive/folders/1EViPZTdyvURl0vlQWfsmtFgypXUi0zym?usp=sharing .

Installation (requires Python 3.10):

Clone the repository in a folder of your choice:

git clone https://github.com/JanMarcoRuizdeVargas/clustercausal.git

Certain foldernames in the notebooks or in clustercausal/experiments/run_gridsearch.py might have to be changed to adjust from Windows to macOS or Linux to work properly.

Change directory to the repository(Windows):

cd .\clustercausal\

macOS:

cd clustercausal

Create a virtual environment(Windows):

python -m venv env

macOS:

python3 -m venv env

Activate the virtual environment(Windows):

.\env\Scripts\activate

macOS:

source env/bin/activate

(To deactivate the virtual environment after one is done, run deactivate for both Windows and macOS)

Install the requirements (same for Windows and macOS):

pip install -r requirements.txt

To be able to calculate SID in the metrics one needs to install R (version 4.3.1 is recommended). In addition, one needs to change os.environ[ "R_HOME" ] = "C:\Program Files\R\R-4.3.1" in clustercausal/experiments/Evaluator.py to the path where R is installed.

Usage:

For custom graphs see Cluster_PC_example.ipynb.

For simulation studies and evaluation see Cluster_PC_simulation_gridsearch.ipynb and Cluster_PC_simulation_mass_simulation.ipynb.

Tests:

pytest --cov