YeoLab / outrigger

Create a *de novo* alternative splicing database, validate splicing events, and quantify percent spliced-in (Psi) from RNA seq data
http://yeolab.github.io/outrigger/
BSD 3-Clause "New" or "Revised" License
62 stars 22 forks source link

Be explicit about inputs/outputs #78

Open olgabot opened 7 years ago

olgabot commented 7 years ago

Description

Initially, outrigger was made to be convenient at the expense of being modular. The simplicity of the three commands below relies directly on the folder structure and file names, which is not modular, so @alaindomissy doesn't like it :)

outrigger index --sj-out-tab *SJ.out.tab \
    --gtf /projects/ps-yeolab/genomes/mm10/gencode/m10/gencode.vM10.annotation.gtf
outrigger validate --genome mm10 \
    --fasta /projects/ps-yeolab/genomes/mm10/GRCm38.primary_assembly.genome.fa
outrigger psi

Proposed changes

Split outrigger index into three parts

The first "step" is really three steps:

  1. Count junction reads and output this as a file
  2. Detect exons and output this as a file
  3. Search for alternative exons

Each of these could be separated out because maybe someone has already counted junction reads using a different program and they just want to detect exons! I personally have run into a problem where I wanted to just count junction reads and nothing else and realized I couldn't.

Finally, each of these steps would explicitly take files, rather than inferring them from the structure (which could lead to surprising bugs).

Explicitly define .gtf file differently from a gffutils Feature database

Importantly, in step 2 of outrigger index, this is where the .gtf annotation file gets used, but, if a file with the same name but .gtf.db exists, then that file is presumed to be the gffutils database, which is bad. It'd be better to have a mutually exclusive argument that can be either --gtf/--db

Explicitly define splice types in outrigger validate

Right now, outrigger validate looks for the structure outrigger_output/index/$SPLICE_TYPE/events.csv, where $SPLICE_TYPE becomes a variable and has to be one of two secret splice types... also bad. Better would be:

outrigger validate --se outrigger_output/index/se/events.csv --mxe outrigger_output/index/mxe/events.csv

Explicitly define splice types in outrigger psi

Similar to outrigger validate, the command outrigger psi alone, with no arguments, looks for the file outrigger_output/index/$SPLICE_TYPE/events.csv and outrigger_output/junctions/reads.csv

Better would be:

outrigger psi --se outrigger_output/index/se/events.csv --mxe outrigger_output/index/mxe/events.csv --reads outrigger_output/junctions/reads.csv