YeoLab / gscripts

General Use Scripts and Helper functions
MIT License
17 stars 18 forks source link

Branch on sailfish running on paired end reads #55

Closed olgabot closed 9 years ago

olgabot commented 9 years ago

Addresses the issues here: https://github.com/YeoLab/gscripts/issues/54 and requires: https://github.com/gpratt/gatk/pull/7

olgabot commented 9 years ago

This seems to be working for me now, but I suspect that any function requiring FilterRepetitiveRegions will need to be replaced with MapRepetitiveRegions.

From a git grep, I get these functions:

(olga)[obotvinnik@tscc-1-45 gscripts]$ git grep --cached FilterRepetitiveRegions
qscripts/analyze_clip_seq.scala:  case class filterRepetitiveRegions(noAdapterFastq: File, filteredResults: File, filteredFastq: File) extends Fil
qscripts/analyze_clip_seq.scala:       override def shortDescription = "FilterRepetitiveRegions"
qscripts/analyze_miRli.scala:    case class filterRepetitiveRegions(noAdapterFastq: File, filteredResults: File, filteredFastq: File) extends Filt
qscripts/analyze_miRli.scala:        override def shortDescription = "FilterRepetitiveRegions"
qscripts/analyze_ribo_seq.scala:  case class filterRepetitiveRegions(noAdapterFastq: File, filteredResults: File, filteredFastq: File) extends Fil
qscripts/analyze_ribo_seq.scala:       override def shortDescription = "FilterRepetitiveRegions"
qscripts/analyze_rna_seq.scala:  case class filterRepetitiveRegions(noAdapterFastq: File, filteredResults: File, filteredFastq: File) extends Filt
qscripts/analyze_rna_seq.scala:       override def shortDescription = "FilterRepetitiveRegions"
qscripts/analyze_rna_seq_from_bam.scala:  case class filterRepetitiveRegions(noAdapterFastq: File, filteredResults: File, filteredFastq: File) ext
qscripts/analyze_rna_seq_from_bam.scala:       override def shortDescription = "FilterRepetitiveRegions"
qscripts/analyze_rna_seq_gently.scala:    case class filterRepetitiveRegions(noAdapterFastq: File, filteredResults: File, filteredFastq: File) ext
qscripts/analyze_rna_seq_gently.scala:        override def shortDescription = "FilterRepetitiveRegions"

what do you think is the next move? @gpratt you said your scripts have been running fine but I'm suspicious that there's no function actually called FilterRepetitiveRegions

olgabot commented 9 years ago

MapRepetitiveRegions completed successfully, now sailfish_quant.py seems to be submitting, but dunno yet whether it's getting run properly

olgabot commented 9 years ago

STAR isn't getting the genome directory correctly.. fixing

olgabot commented 9 years ago

There's something weird going on with the genome finding. The top of my manifest file, showing hidden characters, looks like this: (^I is the TAB character apparently, but it's not the same as \t somehow).

(olga)[obotvinnik@tscc-1-45 processing_scripts]$ cat -eT singlecell_pnms_pe_v2.txt | head
/home/obotvinnik/projects/singlecell_pnms/data/M1_01_R1.fastq.gz^I/home/obotvinnik/projects/singlecell_pnms/data/M1_01_R2.fastq.gz^Ihg19$
/home/obotvinnik/projects/singlecell_pnms/data/M1_02_R1.fastq.gz^I/home/obotvinnik/projects/singlecell_pnms/data/M1_02_R2.fastq.gz^Ihg19$
/home/obotvinnik/projects/singlecell_pnms/data/M1_03_R1.fastq.gz^I/home/obotvinnik/projects/singlecell_pnms/data/M1_03_R2.fastq.gz^Ihg19$
/home/obotvinnik/projects/singlecell_pnms/data/M1_04_R1.fastq.gz^I/home/obotvinnik/projects/singlecell_pnms/data/M1_04_R2.fastq.gz^Ihg19$
/home/obotvinnik/projects/singlecell_pnms/data/M1_05_R1.fastq.gz^I/home/obotvinnik/projects/singlecell_pnms/data/M1_05_R2.fastq.gz^Ihg19$
/home/obotvinnik/projects/singlecell_pnms/data/M1_06_R1.fastq.gz^I/home/obotvinnik/projects/singlecell_pnms/data/M1_06_R2.fastq.gz^Ihg19$
/home/obotvinnik/projects/singlecell_pnms/data/M1_07_R1.fastq.gz^I/home/obotvinnik/projects/singlecell_pnms/data/M1_07_R2.fastq.gz^Ihg19$
/home/obotvinnik/projects/singlecell_pnms/data/M1_08_R1.fastq.gz^I/home/obotvinnik/projects/singlecell_pnms/data/M1_08_R2.fastq.gz^Ihg19$
/home/obotvinnik/projects/singlecell_pnms/data/M1_09_R1.fastq.gz^I/home/obotvinnik/projects/singlecell_pnms/data/M1_09_R2.fastq.gz^Ihg19$
/home/obotvinnik/projects/singlecell_pnms/data/M1_10_R1.fastq.gz^I/home/obotvinnik/projects/singlecell_pnms/data/M1_10_R2.fastq.gz^Ihg19$

This is weird because my single end script seemed to work fine, and it also has the ^I characters.

(olga)[obotvinnik@tscc-1-45 processing_scripts]$ cat -eT singlecell_pnms_se_v1.txt | head
/home/obotvinnik/projects/singlecell_pnms/data/CVN_01_R1.fastq.gz^Ihg19$
/home/obotvinnik/projects/singlecell_pnms/data/CVN_02_R1.fastq.gz^Ihg19$
/home/obotvinnik/projects/singlecell_pnms/data/CVN_03_R1.fastq.gz^Ihg19$
/home/obotvinnik/projects/singlecell_pnms/data/CVN_04_R1.fastq.gz^Ihg19$
/home/obotvinnik/projects/singlecell_pnms/data/CVN_05_R1.fastq.gz^Ihg19$
/home/obotvinnik/projects/singlecell_pnms/data/CVN_06_R1.fastq.gz^Ihg19$
/home/obotvinnik/projects/singlecell_pnms/data/CVN_07_R1.fastq.gz^Ihg19$
/home/obotvinnik/projects/singlecell_pnms/data/CVN_08_R1.fastq.gz^Ihg19$
/home/obotvinnik/projects/singlecell_pnms/data/CVN_09_R1.fastq.gz^Ihg19$
/home/obotvinnik/projects/singlecell_pnms/data/CVN_10_R1.fastq.gz^Ihg19$

Am I missing something in the formatting of the manifest file? I remember hearing at some point that there's supposed to be a trailing tab, but maybe that's not true anymore