Plumb, Gregory, et al. "Explaining Groups of Points in Low-Dimensional Representations." International Conference on Machine Learning. PMLR, 2020.
This repository contains the implementation and reproduction of experiments presented in the above paper. The original paper proposes Transitive Global Translations, an algorithm to explain the differences between the group of points in a low-dimensional space. In this repository, the claims and the results of the paper are studied and validated. The code provided by Plumb et al. was first tested. Next, it was reimplemented in PyTorch, and TGT was extended with a scaling mechanism.
This README is organized as follows:
This repository depends on two external packages for reproducting experiments from the original codebase. Please run the following command in the root directory of this repo:
git clone https://github.com/GDPlumb/ELDR.git
.cd ./ELDR-reproduction && git clone https://github.com/shahcompbio/scvis.git
cd ./ELDR-reproduction/scvis && python setup.py install
Please note that in file ELDR/Code/load_scvis.py
, you might have to replace the config path containing /scvis/lib/scvis/config/model_config.yaml
to ./../scvis/lib/scvis/config/model_config.yaml
.
It is recommended to adhere to the following workflow for using this repository. Please use the scripts in scripts
directory to run the experiments.
.
+-- Data (Data files used in the experiments)
+-- ELDR-reproduction (Reproduction of experiments from author's code)
+-- Environments (YAML environment files for Conda)
+-- Models (Pretrained scvis models from our experiments)
+-- configs (JSON config files for TGT, AE and VAE models)
+-- eldr (Our Pytorch implementation)
+-- experiments (IPython notebooks of our extension experiments)
+-- scripts (Bash scripts for running TGT experiments with different datasets, calls main.py)
+-- main.py (Main entry file for running experiments (training TGT) with CLI)
+-- trainr.py (Script to train low dimensional representations)
We provide the trained models and explanations. To run, navigate to the corresponding experiment in the ./experiments
dir, and follow the .ipynb notebook.
We provide the following environments files at ./Environments : |
file_name | env_name |
---|---|---|
local_tensorflow_env.yml | 2factai | |
pytorch_env.yml | factai |
To install the environment:
conda env create -f $file_name.yml
conda activate $env_name
Note that these environments do not include scvis, as it cannot be installed through conda. The installation process is documented in the scvis github.
As described in the workflow above, trainr.py
trains the low-dimensional representation learning function.
The usage is described as below:
usage: trainr.py [-h] [--model_type {vae,autoencoder}]
[--pretrained_path PRETRAINED_PATH] [--model_dir MODEL_DIR]
[--data_path DATA_PATH] [--train] [--dataset DATASET]
optional arguments:
-h, --help show this help message and exit
--model_type {vae,autoencoder}
Type of model for Learning low dimensional
representations (default: vae)
--pretrained_path PRETRAINED_PATH
Path to the trained model (default: ./Models/vae.pt)
--model_dir MODEL_DIR
Path to save the trained model (default: ./Models)
--data_path DATA_PATH
Path of the data to use (default: ./ELDR/Housing/Data)
--train Do you want to train? (default: False)
--dataset DATASET Dataset on which you are training or equivalently
exp_name. Trained model will be saved with this name.
(default: random)
The script main.py
trains the explanations between the groups. It's usage description is as follows:
usage: main.py [-h] [--model_type {vae,autoencoder}]
[--pretrained_path PRETRAINED_PATH] [--data_path DATA_PATH]
[--num_clusters NUM_CLUSTERS] [--xydata] [--exp_name EXP_NAME]
[--use_scaling]
optional arguments:
-h, --help show this help message and exit
--model_type {vae,autoencoder}
Type of model for Learning low dimensional
representations (default: vae)
--pretrained_path PRETRAINED_PATH
Path to the trained model (default: ./Models/vae.pt)
--data_path DATA_PATH
Path of the data to use (default: ./ELDR/Housing/Data)
--num_clusters NUM_CLUSTERS
Number of Clusters (default: 6)
--xydata Labels and data stored seperately (default: False)
--exp_name EXP_NAME Name of the experiment. Everything will be saved at
./experiments/$exp_name$ (default: Housing)
--use_scaling Use extended explanations with exponential scaling
(default: False)
We appreciate the original authors for making the code public. We would like to thank the lead author Gregory Plumb for promptly replying to the emails for suggestions and help. Finally, a great thanks to Christina Winkler for the feedback throughout the project.
Rajeev Verma, Paras Dahal, Jim Wagemans and Auke Elfrink, University of Amsterdam
[1] Jiarui Ding, Anne Condon, and Sohrab P. Shah. Interpretable dimensionality reduction of single cell transcriptome data with deep generative models. bioRxiv, 2017.217
[2] Dheeru Dua and Casey Graff. UCI machine learning repository, 2017.N.B. The ‘Pima Indians Diabetes’ and ‘Boston Housing’ datasets are no longer available from this source.
[3] Gregory Plumb, Jonathan Terhorst, Sriram Sankararaman, and Ameet Talwalkar. Explaining groups of points in low-dimensional representations, 2020
[4] Karthik Shekhar, Sylvain W Lapan, Irene E Whitney, Nicholas M Tran, Evan Macosko, Monika Kowalczyk, Xian222Adiconis, Joshua Z Levin, James Nemesh, Melissa Goldman, Steven Mccarroll, Constance L Cepko, Aviv Regev, and Joshua R Sanes. Comprehensive classification of retinal bipolar neurons by single-cell transcriptomics. Cell,166:1308–1323.e30, 08 2016.