DaehwanKimLab / tophat

Spliced read mapper for RNA-Seq
http://ccb.jhu.edu/software/tophat
Boost Software License 1.0
90 stars 46 forks source link

What's wrong with bowtie? (--reorder option with bowtie1) #56

Open Damtagor opened 5 years ago

Damtagor commented 5 years ago

Currently, I am trying to use the PERL script of GFusion [https://github.com/xiaofengsong/GFusion].

In half of the execution, it throws an error because bowtie executed unrecognized option '--reorder' (an option of bowtie2). When I try to use bowtie2 indexes, tophat doesn't recognize them because it is searching bowtie indexes only. The script doesn't have any '--reorder' and it is implemented for bowtie 1. Do you know what could be the problem?

I used this command:

perl GFusion.pl.txt -o output1 -r 0 -p 12 -i /mnt/home/soft/human/data/hg38_illumina/Homo_sapiens/NCBI/GRCh38/Sequence/BowtieIndex/genome -g /mnt/home/soft/human/data/hg38_illumina/Homo_sapiens/NCBI/GRCh38/Annotation/Genes/genes.gtf -1 test_1.fastq -2 test_2.fastq

Then, an error appeared because Bow tie didn't recognize the option --reorder:

[Tue Dec  4 18:57:08 2018]

[2018-12-04 18:57:08] Beginning TopHat run (v2.1.0)
-----------------------------------------------
[2018-12-04 18:57:08] Checking for Bowtie
                  Bowtie version:        1.1.2.0
[2018-12-04 18:57:10] Checking for Bowtie index files (genome)..
[2018-12-04 18:57:10] Checking for reference FASTA file
[2018-12-04 18:57:10] Generating SAM header for /mnt/home/soft/human/data/hg38_illumina/Homo_sapiens/NCBI/GRCh38/Sequence/BowtieIndex/genome
[2018-12-04 18:57:31] Preparing reads
         left reads: min. length=50, max. length=50, 84131 kept reads (113 discarded)
        right reads: min. length=50, max. length=50, 83725 kept reads (519 discarded)
[2018-12-04 18:57:33] Mapping left_kept_reads to genome genome with Bowtie
        [FAILED]
Error running bowtie:
bowtie: unrecognized option '--reorder'
Usage:
bowtie [options]* <ebwt> {-1 <m1> -2 <m2> | --12 <r> | <s>} [<hit>]

  <m1>    Comma-separated list of files containing upstream mates (or the
          sequences themselves, if -c is set) paired with mates in <m2>
  <m2>    Comma-separated list of files containing downstream mates (or the
          sequences themselves if -c is set) paired with mates in <m1>
  <r>     Comma-separated list of files containing Crossbow-style reads.  Can be
          a mixture of paired and unpaired.  Specify "-" for stdin.
  <s>     Comma-separated list of files containing unpaired reads, or the
          sequences themselves, if -c is set.  Specify "-" for stdin.
  <hit>   File to write hits to (default: stdout)
Input:
  -q                 query input files are FASTQ .fq/.fastq (default)
  -f                 query input files are (multi-)FASTA .fa/.mfa
  -r                 query input files are raw one-sequence-per-line
  -c                 query sequences given on cmd line (as <mates>, <singles>)
  -C                 reads and index are in colorspace
  -Q/--quals <file>  QV file(s) corresponding to CSFASTA inputs; use with -f -C
  --Q1/--Q2 <file>   same as -Q, but for mate files 1 and 2 respectively
  -s/--skip <int>    skip the first <int> reads/pairs in the input
  -u/--qupto <int>   stop after first <int> reads/pairs (excl. skipped reads)
  -5/--trim5 <int>   trim <int> bases from 5' (left) end of reads
  -3/--trim3 <int>   trim <int> bases from 3' (right) end of reads
  --phred33-quals    input quals are Phred+33 (default)
  --phred64-quals    input quals are Phred+64 (same as --solexa1.3-quals)
  --solexa-quals     input quals are from GA Pipeline ver. < 1.3
  --solexa1.3-quals  input quals are from GA Pipeline ver. >= 1.3
  --integer-quals    qualities are given as space-separated integers (not ASCII)
  --large-index      force usage of a 'large' index, even if a small one is present
Alignment:
  -v <int>           report end-to-end hits w/ <=v mismatches; ignore qualities
    or
  -n/--seedmms <int> max mismatches in seed (can be 0-3, default: -n 2)
  -e/--maqerr <int>  max sum of mismatch quals across alignment for -n (def: 70)
  -l/--seedlen <int> seed length for -n (default: 28)
  --nomaqround       disable Maq-like quality rounding for -n (nearest 10 <= 30)
  -I/--minins <int>  minimum insert size for paired-end alignment (default: 0)
  -X/--maxins <int>  maximum insert size for paired-end alignment (default: 250)
  --fr/--rf/--ff     -1, -2 mates align fw/rev, rev/fw, fw/fw (default: --fr)
  --nofw/--norc      do not align to forward/reverse-complement reference strand
  --maxbts <int>     max # backtracks for -n 2/3 (default: 125, 800 for --best)
  --pairtries <int>  max # attempts to find mate for anchor hit (default: 100)
  -y/--tryhard       try hard to find valid alignments, at the expense of speed
  --chunkmbs <int>   max megabytes of RAM for best-first search frames (def: 64)
Reporting:
  -k <int>           report up to <int> good alignments per read (default: 1)
  -a/--all           report all alignments per read (much slower than low -k)
  -m <int>           suppress all alignments if > <int> exist (def: no limit)
  -M <int>           like -m, but reports 1 random hit (MAPQ=0); requires --best
  --best             hits guaranteed best stratum; ties broken by quality
  --strata           hits in sub-optimal strata aren't reported (requires --best)
Output:
  -t/--time          print wall-clock time taken by search phases
  -B/--offbase <int> leftmost ref offset = <int> in bowtie output (default: 0)
  --quiet            print nothing but the alignments
  --refout           write alignments to files refXXXXX.map, 1 map per reference
  --refidx           refer to ref. seqs by 0-based index rather than name
  --al <fname>       write aligned reads/pairs to file(s) <fname>
  --un <fname>       write unaligned reads/pairs to file(s) <fname>
  --max <fname>      write reads/pairs over -m limit to file(s) <fname>
  --suppress <cols>  suppresses given columns (comma-delim'ed) in default output
  --fullref          write entire ref name (default: only up to 1st space)
Colorspace:
  --snpphred <int>   Phred penalty for SNP when decoding colorspace (def: 30)
     or
  --snpfrac <dec>    approx. fraction of SNP bases (e.g. 0.001); sets --snpphred
  --col-cseq         print aligned colorspace seqs as colors, not decoded bases
  --col-cqual        print original colorspace quals, not decoded quals
  --col-keepends     keep nucleotides at extreme ends of decoded alignment
SAM:
  -S/--sam           write hits in SAM format
  --mapq <int>       default mapping quality (MAPQ) to print for SAM alignments
  --sam-nohead       supppress header lines (starting with @) for SAM output
  --sam-nosq         supppress @SQ header lines for SAM output
  --sam-RG <text>    add <text> (usually "lab=value") to @RG line of SAM header
Performance:
  -o/--offrate <int> override offrate of index; must be >= index's offrate
  -p/--threads <int> number of alignment threads to launch (default: 1)
  --mm               use memory-mapped I/O for index; many 'bowtie's can share
  --shmem            use shared mem for index; many 'bowtie's can share
Other:
  --seed <int>       seed for random number generator
  --verbose          verbose output (for debugging)
  --version          print version information and quit
  -h/--help          print this usage message
Command: bowtie --wrapper basic-0 -v 2 -k 20 -m 20 -S -p 12 --reorder --sam-nohead --max /dev/null /mnt/home/soft/human/data/hg38_illumina/Homo_sapiens/NCBI/GRCh38/Sequence/BowtieIndex/genome -

open: No such file or directory
[main_samview] fail to open "output1/accepted_hits.bam" for reading.
open: No such file or directory
[main_samview] fail to open "output1/unmapped.bam" for reading.
[Tue Dec  4 18:57:33 2018]
Warning: Could not find any reads in "output1/un.fastq"
# reads processed: 0
# reads with at least one reported alignment: 0 (0.00%)
# reads that failed to align: 0 (0.00%)
No alignments
[samopen] SAM header is present: 195 sequences.
[sam_read1] reference 'ID:Bowtie        VN:1.1.2        CL:"bowtie --wrapper basic-0 -p 12 /mnt/home/soft/human/data/hg38_illumina/Homo_sapiens/NCBI/GRCh38/Sequence/BowtieIndex/genome output1/un.fastq -S output1/fusion_out/un.sam"
r3      LN:198295559
@SQ     SN:chr4 LN:190214555
@SQ     SN:chr5 LN:181538259
@SQ     !' is recognized as '*'.
[main_samview] truncated file.
[samopen] SAM header is present: 195 sequences.
[sam_read1] reference 'ID:Bowtie        VN:1.1.2        CL:"bowtie --wrapper basic-0 -p 12 /mnt/home/soft/human/data/hg38_illumina/Homo_sapiens/NCBI/GRCh38/Sequence/BowtieIndex/genome output1/un.fastq -S output1/fusion_out/un.sam"
hr3     LN:198295559
@SQ     SN:chr4 LN:190214555
@SQ     SN:chr5 LN:181538259
@SQ!' is recognized as '*'.
[main_samview] truncated file.
open: No such file or directory
[main_samview] fail to open "output1/accepted_hits.bam" for reading.
open: No such file or directory
[main_samview] fail to open "output1/accepted_hits.bam" for reading.
[samopen] no @SQ lines in the header.
[sam_read1] missing header? Abort!
[bam_header_read] EOF marker is absent. The input is probably truncated.
[Tue Dec  4 18:57:37 2018]
 Result: No Fusion Genes!  The time elapsed: about 0 hours.

After this, I used Bowtie2 indexes:

perl GFusion.pl.txt -o output1 -r 0 -p 12 -i /mnt/home/soft/human/data/hg38_illumina/Homo_sapiens/NCBI/GRCh38/Sequence/Bowtie2Index/genome -g /mnt/home/soft/human/data/hg38_illumina/Homo_sapiens/NCBI/GRCh38/Annotation/Genes/genes.gtf -1 test_1.fastq -2 test_2.fastq

But the script doesn't use that type of indexes:

[Tue Dec  4 19:01:33 2018]

[2018-12-04 19:01:33] Beginning TopHat run (v2.1.0)
-----------------------------------------------
[2018-12-04 19:01:33] Checking for Bowtie
                  Bowtie version:        1.1.2.0
[2018-12-04 19:01:33] Checking for Bowtie index files (genome)..
Error: Could not find Bowtie index files (/mnt/home/soft/human/data/hg38_illumina/Homo_sapiens/NCBI/GRCh38/Sequence/Bowtie2Index/genome.*.ebwt)
open: No such file or directory
[main_samview] fail to open "output1/accepted_hits.bam" for reading.
open: No such file or directory
[main_samview] fail to open "output1/unmapped.bam" for reading.
[Tue Dec  4 19:01:33 2018]
Could not locate a Bowtie index corresponding to basename "/mnt/home/soft/human/data/hg38_illumina/Homo_sapiens/NCBI/GRCh38/Sequence/Bowtie2Index/genome"
Command: bowtie --wrapper basic-0 -p 12 -S /mnt/home/soft/human/data/hg38_illumina/Homo_sapiens/NCBI/GRCh38/Sequence/Bowtie2Index/genome output1/un.fastq output1/fusion_out/un.sam
[samopen] SAM header is present: 195 sequences.
[sam_read1] reference 'ID:Bowtie        VN:1.1.2        CL:"bowtie --wrapper basic-0 -p 12 /mnt/home/soft/human/data/hg38_illumina/Homo_sapiens/NCBI/GRCh38/Sequence/BowtieIndex/genome output1/un.fastq -S output1/fusion_out/un.sam"
r3      LN:198295559
@SQ     SN:chr4 LN:190214555
@SQ     SN:chr5 LN:181538259
@SQ     !' is recognized as '*'.
[main_samview] truncated file.
[samopen] SAM header is present: 195 sequences.
[sam_read1] reference 'ID:Bowtie        VN:1.1.2        CL:"bowtie --wrapper basic-0 -p 12 /mnt/home/soft/human/data/hg38_illumina/Homo_sapiens/NCBI/GRCh38/Sequence/BowtieIndex/genome output1/un.fastq -S output1/fusion_out/un.sam"
hr3     LN:198295559
@SQ     SN:chr4 LN:190214555
@SQ     SN:chr5 LN:181538259
@SQ!' is recognized as '*'.
[main_samview] truncated file.
open: No such file or directory
[main_samview] fail to open "output1/accepted_hits.bam" for reading.
open: No such file or directory
[main_samview] fail to open "output1/accepted_hits.bam" for reading.
[samopen] no @SQ lines in the header.
[sam_read1] missing header? Abort!
[bam_header_read] EOF marker is absent. The input is probably truncated.
[Tue Dec  4 19:01:35 2018]
 Result: No Fusion Genes!  The time elapsed: about 0 hours.

I don't know how to solve this situation exactly. It looks like it works with hg19 but that shouldn't be the problem. Thanks in advance.