Initially, outrigger was made to be convenient at the expense of being modular. The simplicity of the three commands below relies directly on the folder structure and file names, which is not modular, so @alaindomissy doesn't like it :)
Each of these could be separated out because maybe someone has already counted junction reads using a different program and they just want to detect exons! I personally have run into a problem where I wanted to just count junction reads and nothing else and realized I couldn't.
Finally, each of these steps would explicitly take files, rather than inferring them from the structure (which could lead to surprising bugs).
Explicitly define .gtf file differently from a gffutils Feature database
Importantly, in step 2 of outrigger index, this is where the .gtf annotation file gets used, but, if a file with the same name but .gtf.db exists, then that file is presumed to be the gffutils database, which is bad. It'd be better to have a mutually exclusive argument that can be either --gtf/--db
Explicitly define splice types in outrigger validate
Right now, outrigger validate looks for the structure outrigger_output/index/$SPLICE_TYPE/events.csv, where $SPLICE_TYPE becomes a variable and has to be one of two secret splice types... also bad. Better would be:
Similar to outrigger validate, the command outrigger psi alone, with no arguments, looks for the file outrigger_output/index/$SPLICE_TYPE/events.csv and outrigger_output/junctions/reads.csv
Description
Initially,
outrigger
was made to be convenient at the expense of being modular. The simplicity of the three commands below relies directly on the folder structure and file names, which is not modular, so @alaindomissy doesn't like it :)Proposed changes
Split
outrigger index
into three partsThe first "step" is really three steps:
Each of these could be separated out because maybe someone has already counted junction reads using a different program and they just want to detect exons! I personally have run into a problem where I wanted to just count junction reads and nothing else and realized I couldn't.
Finally, each of these steps would explicitly take files, rather than inferring them from the structure (which could lead to surprising bugs).
Explicitly define
.gtf
file differently from agffutils
Feature databaseImportantly, in step 2 of
outrigger index
, this is where the.gtf
annotation file gets used, but, if a file with the same name but.gtf.db
exists, then that file is presumed to be thegffutils
database, which is bad. It'd be better to have a mutually exclusive argument that can be either--gtf/--db
Explicitly define splice types in
outrigger validate
Right now,
outrigger validate
looks for the structureoutrigger_output/index/$SPLICE_TYPE/events.csv
, where$SPLICE_TYPE
becomes a variable and has to be one of two secret splice types... also bad. Better would be:Explicitly define splice types in
outrigger psi
Similar to
outrigger validate
, the commandoutrigger psi
alone, with no arguments, looks for the fileoutrigger_output/index/$SPLICE_TYPE/events.csv
andoutrigger_output/junctions/reads.csv
Better would be: