TRON-Bioinformatics / EasyFuse

EasyFuse is a pipeline for accurate fusion gene detection from RNA-seq data.
GNU General Public License v3.0
51 stars 12 forks source link

EasyFuse

GitHub release (latest SemVer) Docker Image Version (latest semver) License

EasyFuse is a pipeline to detect fusion transcripts from paired-end RNA-seq data with high accuracy. The current version of EasyFuse uses three fusion gene detection tools, STAR-Fusion, Fusioncatcher and Arriba along with a powerful read filtering strategy, stringent re-quantification of supporting reads and machine learning for highly accurate predictions.

Usage

Dependencies

Please have a look at environment.yml. The conda environment to run nextflow can be installed with the following command:

conda env create -f environment.yml --prefix conda_env/

Download reference data

Before running EasyFuse the following reference annotation data needs to be downloaded (~104 GB).

# Download reference archive
wget ftp://easyfuse.tron-mainz.de/easyfuse_ref_v4.tar.gz

# Extract reference archive
tar xvfz easyfuse_ref_v4.tar.gz

Install the nextflow pipeline

There are two alternatives, manually install the workflow or let Nexftlow handle this via the GitHub repository.

To install manually:

git clone https://github.com/TRON-Bioinformatics/EasyFuse.git
cd EasyFuse

# In order to run the test script you have to move the reference folder to test/easyfuse_ref/
mv ../easyfuse_ref_v4/ test/easyfuse_ref/

To install with Nextflow (only available from release 2.0.1 onwards):

nextflow run tron-bioinformatics/easyfuse -r x.y.z --help

where x.y.z corresponds to an EasyFuse release.

Run the pipeline

Provide your downloaded reference data with the parameter --reference

Generate a tab-delimited input table with your matching FASTQs. The format of the table is: sample_name, fq1, fq2 (without headers). E.g.:

sample_01   /path/to/sample_01_R1.fastq.gz  /path/to/sample_01_R2.fastq.gz

Start the pipeline as follows if you installed manually

nextflow run main.nf \
  -profile conda \
  --reference /path/to/reference/folder \
  --input_files /path/to/input_table_file \
  --output /path/to/output_folder

Or as follows if you installed it via Nextflow (only available from release 2.0.1 onwards):

nextflow run tron-bioinformatics/easyfuse -r x.y.z \
  -profile conda \
  --reference /path/to/reference/folder \
  --input_files /path/to/input_table_file \
  --output /path/to/output_folder

Note: If you want to use a custom profile (e.g. for running jobs on a cluster), please refer to https://www.nextflow.io/docs/latest/config.html for further information.

Output format

EasyFuse creates an output folder for each input sample containing the following files:

Within the files, each line describes a candidate fusion transcript. The file fusions.csv contains all candidate fusions with annotated features, the prediction probability assigned by the EasyFuse model, and the corresponding prediction class (positive or negative). The file fusions.pass.csv contains only positive predicted gene fusions.

Column description

Overview of all features/columns annotated by EasyFuse:

Citation

If you use EasyFuse, please cite: Weber D, Ibn-Salem J, Sorn P, et al. Nat Biotechnol. 2022