eiriniar / CellCnn

Representation Learning for detection of phenotype-associated cell subsets
http://www.imsb.ethz.ch/research/claassen/Software/cellcnn.html
GNU General Public License v3.0
64 stars 28 forks source link

======= CellCnn

Installation

CellCnn was originally written in Python2.7. For a Python3 version, please check out this branch: https://github.com/eiriniar/CellCnn/tree/python3

There are several ways to run Python, but we recommend using a virtual environment. To set up a virtual environment, you can perform the following steps:

  1. Download the Python2.7 installation script corresponding to your operating system from https://conda.io/miniconda.html . For example, for Mac OS it should be called "Miniconda2-latest-MacOSX-x86_64.sh".

  2. Run the installation script (please use the script name corresponding to your operating system): bash Miniconda2-latest-MacOSX-x86_64.sh

  3. Open a new terminal and create a virtual environment for CellCnn, e.g. "cellcnn_env": conda create --name cellcnn_env python=2.7

  4. Activate the virtual environment: source activate cellcnn_env


After having Python2.7 running on your system, please do the following to install CellCnn:

  1. Clone the CellCnn repository: git clone https://github.com/eiriniar/CellCnn.git

  2. Install the CellCnn dependencies: pip install -r https://raw.githubusercontent.com/eiriniar/CellCnn/master/requirements.txt

  3. To install CellCnn, run the following command after replacing path_to_CellCnn with the actual path in your system: pip install -e path_to_CellCnn/CellCnn


Changed in version v0.2: we now use the lightweight flowIO package for reading mass/flow cytometry data. Thanks to the package author Scott White for pointing out this possibility!

Usage

Examples are provided in the subfolder CellCnn/cellCnn/examples.


Alternatively, for the analysis of mass/flow cytometry samples, CellCnn can be run from the command line. To get a list of command line options please run:

python run_analysis.py --help

For a CellCnn analysis with default settings only two arguments have to be provided:

python run_analysis.py -f fcs_samples_with_labels.csv -m markers.csv

| The first input argument is a two-column CSV file, where the first column specifies input sample filenames and the second column the corresponding class labels. An example file is provided in CellCnn/cellCnn/examples/NK_fcs_samples_with_labels.csv. | The second input argument is a CSV file containing the names of markers/channels that should be used for the analysis. An example file is provided in CellCnn/cellCnn/examples/NK_markers.csv.

For example, to perform the analysis outlined in CellCnn/cellCnn/examples/NK_cell.ipynb from the command line, you can run the following (assuming your current directory is CellCnn/cellCnn/examples):

python ../run_analysis.py -f NK_fcs_samples_with_labels.csv -m NK_markers.csv -i NK_cell_dataset/gated_NK/ -o outdir_NK --export_csv --group_a CMV- --group_b CMV+ --verbose 0

The above command performs a binary classification CellCnn analysis, exports the learned filter weights as CSV files in the directory outdir_NK/exported_filter_weights and generates result plots in outdir_NK/plots. The following plots are generated:

filter_plots """"""""""""

training_plots """"""""""""""

These plots are generated on the basis of samples used for model training.

In addition, the following plots are produced for each selected filter (e.g. filter i):

validation_plots """"""""""""""""

Same as the training_plots, but generated on the basis of samples used for model validation.


After performing model training once, you can refine the plots with different cutoff values for the selected filters and cell populations. Training does not have to be repeated for refining the plots. The pre-computed results can be used with the option --load_results.

Another relevant argument is --export_selected_cells, which produces a CSV result file for each input FCS file and stores it in outdir/selected_cells. Rows in the CSV result file correspond to cells in the order found in the FCS input file. The CSV result file contains two columns per selected filter, the first indicating the cell filter response as a continuous value and the second containing a binary value resulting from thresholding the continuous cell filter response. This later column is an indicator of whether a cell belongs to the cell population selected by a particular filter.

python ../run_analysis.py -f NK_fcs_samples_with_labels.csv -m NK_markers.csv -i NK_cell_dataset/gated_NK/ -o outdir_NK --group_a CMV- --group_b CMV+ --filter_response_thres 0.3 --load_results --export_selected_cells

Documentation

For additional information, CellCnn's documentation is hosted on http://eiriniar.github.io/CellCnn/