algaebrown / circSTAMP_pipe

0 stars 0 forks source link

This is a pipeline that trims, maps, detect circluar RNAs, call edits and also call RIPs in circles.

Basic usage

Installation

Snakemake:

Singularity

How to run

On TSCC 2.0, or Slurm system

# install your own snakemake prior to this
conda activate snakemake
module load singularitypro
# 
snakemake -s SnakeMain.py \
    --configfile YOUR_CONFIG.yaml \
    --profile profiles/tscc2 \
    -n

On other server, you need to modify profiles/. Check snakemake documentation on how to modify it.

Preparing to run, generate config.yaml

Output

Advanced usage:

Builing Index

SnakeBuildIndex.py does the job to build all indicies needed by CIRI2 and RNA-seq pipeline. After all indices are build, you need to make a config for CIRI file. see ciriconfig_full_tscc2.yaml as an example.

Preprocessing: Getting circles

This is the most central part of this pipeline, which is to get circles. I didn't do much. Just trim and use CIRIquant pipeline. pipeline

snakemake -s SnakeMain.smk --configfile config/tao_nextera17_EV.yaml --profile profiles/tscc2 --dag output/HepG2EV_2.gtf | awk '/digraph/{y=1;}y' | dot -Tsvg > main.svg

Can be used to generate the flowchart.

normal RNA-seq preprocessing

The pipeline also tries to do general RNA-seq (find linear RNA).

pipeline

# to generate DAG
snakemake -s SnakeMain.smk --configfile config/tao_nextera17_EV.yaml --profile profiles/tscc2 --dag output/bams/HepG2EV_2.Aligned.sortedByCoord.out.bam.bai | awk '/digraph/{y=1;}y' | dot -Tsvg > rnaseq.svg

STAMP experiment:

Due to the pseudocircular reference (duplicate the sequence twice to get the BSJ) generated to map circles, calling edits is complicated. I had to aggregate the edits from the two copies in the pseudocircular reference and then call edits. The code is implemented in SnakeCircEdit.smk. To trigger it, add these following fields to your config.

see configs/circ_nextera_iter13.yaml

STAMP: # the name for STAMP libraries
  - RBM15_STAMP
  - YTHDF2_STAMP
  - U2AF2
  - IGF2BP2
  - YTHDF2

STAMP_control: # the name for APOBEC-only libraries
  - APOBEC1only
  - APOBEC1_1
  - APOBEC1_2

REF_fwd: C # C to T # Define what edit you are looking for
ALT_fwd: T

external_stamp_control: # it is possible to use APOBEC edits in a different folder
   - another_APOBEC_library
      prefix: /PATH/TO/APOBEC_CIRCLE_STAMP/WORKDIR
      sample_name: the_other_apobed

Here is what it is doing. It finds edits, and use APOBEC significant edits as background, to find motif around STAMP edits.

pipeline

# to generate DAG
snakemake -s SnakeMain.smk --configfile config/tao_nextera13.yaml --profile profiles/tscc2 --dag output/edits/homer/YTHDF2.APOBEC1_1.homer | awk '/digraph/{y=1;}y' | dot -Tsvg > stamp.svg

It uses the alignment generated from the CIRIquant pipeline and find edit in it. Here are some difference from the normal sailor pipeline

RIP experiment

The pipeline also includes code to call enriched circles. To trigger it to run, add the following:

see configs/circ_nextera_iter11.yaml

CIRCRIP_PATH: /home/hsher/bin/circRIP # where you downloaded https://github.com/bioinfolabwhu/circRIP/ # leave blank if you have replicates

RIP_comparison: # Here you specify what to use a background and what to use as input, these names have to be in manifest['Sample']
  monosome1:
    ip: PP_Monosome1
    in: PP_Input1
  monosome2:
    ip: PP_Monosome2
    in: PP_Input2
  polysome1:
    ip: PP_Polysome1
    in: PP_Input1
  polysome2:
    ip: PP_Polysome2
    in: PP_Input2

fit_overdispersion_from:
  - PP_Input1
  - PP_Input2

My way of calling enriched RIP circles

pipeline

# to generate DAG
snakemake -s SnakeMain.smk --configfile config/tao_nextera11.yaml --profile profiles/tscc2 --dag "output/RIP/Mono1.IN1.csv" | awk '/digraph/{y=1;}y' | dot -Tsvg > myrip.svg