ababaian / LIONS

LIONS is a bioinformatic analysis pipeline which brings together a few pieces of software and some home-brewed scripts to annotate a paired-end RNAseq library to detect TE-intiated transcripts
GNU General Public License v3.0
27 stars 13 forks source link

Can't find bowtie 2 index files #12

Closed Alex-Nesta closed 5 years ago

Alex-Nesta commented 5 years ago

Hi,

I'm trying to set up my first run of LIONS. I have an error with bowtie 2 index files.

I am running on my university cluster, and I am not using docker.

The error:

[2019-01-16 11:34:23] Beginning TopHat run (v2.0.13)
-----------------------------------------------
[2019-01-16 11:34:23] Checking for Bowtie
          Bowtie version:    2.3.1.0
[2019-01-16 11:34:24] Checking for Bowtie index files (genome)..
Error: Could not find Bowtie 2 index files (hg38.*.bt2)
du: cannot access `accepted_hits.bam': No such file or directory
/projects/beck-lab/alex/bin/LIONS/scripts/eastLion.sh: line 195: [: -ge: unary operator expected
Alignment probably didnt work
 ============= ERROR 10: Alignment Not Generated =============

And here is the full output. You can see I declared the bowtie2 export path:

[nestaa@helix127 LIONS]$ export BOWTIE_INDEXES=/projects/beck-lab/alex/bin/LIONS/resources/hg38/index
[nestaa@helix127 LIONS]$ export BOWTIE2_INDEXES=/projects/beck-lab/alex/bin/LIONS/resources/hg38/index
[nestaa@helix127 LIONS]$ ./lions_opt.sh 

===============================================================
========================= L I O N S ===========================
===============================================================
                             _   _
                           _/ \|/ \_
                          /\\/   \//\
                          \|/<\ />\|/   *RAWR*
                          /\   _   /\  /
                          \|/\ Y /\|/
                           \/|v-v|\/
                            \/\_/\/

 Importing default file:
      ./LIONS/controls/parameter.ctrl

 --
 running initializeLIONS.sh

 ./LIONS/scripts/initilizeLIONS.sh ...
 ==============================================================
     Project Name: MCF7vsMCF10A 
     Run Identification Number: 190116_1134
     Library List: /projects/beck-lab/alex/bin/LIONS/controls/input.list
     Genome: hg38 
     System: glitch
             cores: 1
             qsub:  (if applicable)
     LIONS base dir: /projects/beck-lab/alex/bin/LIONS
     Call Settings: oncoexaptation

 ------ Run LIONS self-check procedures ------ 

 Initialize/initializeScripts.sh found.
 ... checking scripts
     Check that LIONS scripts exist and are read/executable

... /projects/beck-lab/alex/bin/LIONS/controls/input.list found.
... /projects/beck-lab/alex/bin/LIONS/controls/parameter.ctrl found.
... /projects/beck-lab/alex/bin/LIONS/controls/system.sysctrl found.
... Initialize/initializeLIONS.sh found.
... Initialize/initializeScripts.sh found.
... Initialize/initializeBin.sh found.
... Initialize/initializeRes.sh found.
... eastLion.sh found.
... westLion.sh found.
... RNAseqPipeline/RNAseqMaster.sh found.
... RNAseqPipeline/RNAseqCoverageCalculator.sh found.
... RNAseqPipeline/RNAgetRes.sh found.
... RNAseqPipeline/RNAseqPipeline.sh found.
... RNAseqPipeline/RPKM.sh found.
... RNAseqPipeline/TE_stats.sh found.
... RNAseqPipeline/WIG_RPKM.sh found.
... RNAseqPipeline/resourceGeneration/buildResources.sh found.
... RNAseqPipeline/resourceGeneration/buildResourceGTF.sh found.
... RNAseqPipeline/resourceGeneration/RepeatMaskerGenerate.sh found.
... RNAseqPipeline/resourceGeneration/RefSeqGenerate.sh found.
... ChimericReadTool/ChimericReadTool.sh found.
... ChimericReadTool/exon_scan.sh found.
... ChimericReadTool/intervalTree.py found.
... ChimericReadTool/chimericReadSearch.py found.
... ChimericReadTool/chimIntersect.sh found.
... ChimericReadTool/chimIntLookup.R found.
... ChimericReadTool/chimSort.R found.
... ChimericAnalysis/chimGroup.R found.

 ... script check completed successfully!

 ... checking binaries
     Using default binary initilization file
 initializeBin.sh found.
 attempting to run initializeBin.sh
     Check that system software requirements are available and working.

... samtools_0.1.18 found.
... bam2fastx found.
... tophat2 found.
... bowtie2 found.
... bowtie2-build found.
... cufflinks found.
... /home/nestaa/.conda/envs/python3/bin/python found.
... java found.
... Rscript found.
... /projects/beck-lab/alex/bin/LIONS/software/wigToBigWig found.

     Check Python3 Modules are installed
 ... csv python3 module found
 ... setuptools python3 module found
 ... pysam python3 module found
 ... sys python3 module found
 ... pickle python3 module found
 ... os python3 module found
 ... pprint python3 module found
 ... threading python3 module found
 ... collections python3 module found
 ... multiprocessing python3 module found
 ... datetime python3 module found
 ... re python3 module found
 ... binary check completed successfully!

 ... checking resource: hg38 
     Check genome files, repeat files and annotations are in order

... hg38.fa found.
... rm_hg38.ucsc found.
... refseq_hg38.ucsc found.
 ... resource check completed successfully!

 ---------- Set-up Project Workspace ---------- 
 Initializing MCF7vsMCF10A Directory: /projects/beck-lab/alex/bin/LIONS/projects/MCF7vsMCF10A
/projects/beck-lab/alex/bin/LIONS/controls/input.list /projects/beck-lab/alex/bin/LIONS/controls/parameter.ctrl
 initialization completed successfully.

                     E A S T       L I O N                     

 ./LIONS/scripts/eastLion.sh 
===============================================================
  Align reads to genome and perform TE-initiation analysis

 Iteration 1: mcf10a ------------------------------------------
      run:  eastLion.sh mcf10a
ln: creating hard link `/projects/beck-lab/alex/bin/LIONS/projects/MCF7vsMCF10A/mcf10a/hg38.1.bt2': File exists
ln: creating hard link `/projects/beck-lab/alex/bin/LIONS/projects/MCF7vsMCF10A/mcf10a/hg38.2.bt2': File exists
ln: creating hard link `/projects/beck-lab/alex/bin/LIONS/projects/MCF7vsMCF10A/mcf10a/hg38.3.bt2': File exists
ln: creating hard link `/projects/beck-lab/alex/bin/LIONS/projects/MCF7vsMCF10A/mcf10a/hg38.4.bt2': File exists
ln: creating hard link `/projects/beck-lab/alex/bin/LIONS/projects/MCF7vsMCF10A/mcf10a/hg38.bwa.names': File exists
ln: creating hard link `/projects/beck-lab/alex/bin/LIONS/projects/MCF7vsMCF10A/mcf10a/hg38.chr.size': File exists
ln: creating hard link `/projects/beck-lab/alex/bin/LIONS/projects/MCF7vsMCF10A/mcf10a/hg38.fa': File exists
ln: creating hard link `/projects/beck-lab/alex/bin/LIONS/projects/MCF7vsMCF10A/mcf10a/hg38.fa.fai': File exists
     ... eastLion.sh running
     Library: mcf10a
     Ouput Directory: /projects/beck-lab/alex/bin/LIONS/projects/MCF7vsMCF10A/mcf10a
     Working Directory: /projects/beck-lab/alex/bin/LIONS/projects/MCF7vsMCF10A/mcf10a
     Alignment Bypass: 1

  No previous alignment detected
  Aligning reads to the genome
     Bam (or fq) input type: 
     Bam output: mcf10a.bam
     Genome: hg38
 Running tophat2 ...
  cmd: /projects/beck-lab/alex/bin/LIONS/bin/tophat2   -p 1 -r 76 --report-secondary-alignments -o /projects/beck-lab/alex/bin/LIONS/projects/MCF7vsMCF10A/mcf10a hg38 /projects/beck-lab/alex/bin/LIONS/projects/MCF7vsMCF10A/mcf10a/temp.1.fq.gz /projects/beck-lab/alex/bin/LIONS/projects/MCF7vsMCF10A/mcf10a/temp.2.fq.gz

[2019-01-16 11:34:23] Beginning TopHat run (v2.0.13)
-----------------------------------------------
[2019-01-16 11:34:23] Checking for Bowtie
          Bowtie version:    2.3.1.0
[2019-01-16 11:34:24] Checking for Bowtie index files (genome)..
Error: Could not find Bowtie 2 index files (hg38.*.bt2)
du: cannot access `accepted_hits.bam': No such file or directory
/projects/beck-lab/alex/bin/LIONS/scripts/eastLion.sh: line 195: [: -ge: unary operator expected
Alignment probably didnt work
 ============= ERROR 10: Alignment Not Generated =============
[nestaa@helix127 LIONS]$

Additionally, I have proof that the bowtie2 indexes exist as specified by bowtie:

[nestaa@helix LIONS]$ ls ./resources/hg38/index
GCA_000001405.15_GRCh38_no_alt_analysis_set.fna.bowtie_index.1.bt2      GCA_000001405.15_GRCh38_no_alt_analysis_set.fna.bowtie_index.tar.gz
GCA_000001405.15_GRCh38_no_alt_analysis_set.fna.bowtie_index.2.bt2      hg38.1.bt2
GCA_000001405.15_GRCh38_no_alt_analysis_set.fna.bowtie_index.3.bt2      hg38.2.bt2
GCA_000001405.15_GRCh38_no_alt_analysis_set.fna.bowtie_index.4.bt2      hg38.3.bt2
GCA_000001405.15_GRCh38_no_alt_analysis_set.fna.bowtie_index.rev.1.bt2  hg38.4.bt2
GCA_000001405.15_GRCh38_no_alt_analysis_set.fna.bowtie_index.rev.2.bt2
ababaian commented 5 years ago

I had a similar issue with tophat not recognizing soft links. Making hard links seemed to fix it but I wonder if that too isn't working depending on the filesystem set-up.

On Line 88 of eastLion.sh can you change: ln $RESOURCES/genome/* $WORK to cp -f $RESOURCES/genome/* $WORK

This is a non-optimal solution if you're running 100s of samples, but for now would you be able to test this?

Alex-Nesta commented 5 years ago
[nestaa@helix LIONS]$ export BOWTIE2_INDEXES=/projects/beck-lab/alex/bin/LIONS/resources/hg38/index
[nestaa@helix LIONS]$ export BOWTIE_INDEXES=/projects/beck-lab/alex/bin/LIONS/resources/hg38/index
[nestaa@helix LIONS]$ ./lions_opt.sh 
[2019-01-16 14:29:26] Beginning TopHat run (v2.0.13)
-----------------------------------------------
[2019-01-16 14:29:26] Checking for Bowtie
          Bowtie version:    2.3.1.0
[2019-01-16 14:29:26] Checking for Bowtie index files (genome)..
Error: Could not find Bowtie 2 index files (hg38.*.bt2)
du: cannot access `accepted_hits.bam': No such file or directory
/projects/beck-lab/alex/bin/LIONS/scripts/eastLion.sh: line 195: [: -ge: unary operator expected
Alignment probably didnt work
 ============= ERROR 10: Alignment Not Generated =============
[nestaa@helix LIONS]$ sed -n 85,90p ./scripts/eastLion.sh 
    export WORK=$outDir # work in output space

    # BT2 Genome Index (link to work space)
    cp -f $RESOURCES/genome/* $WORK

    # Bam input file (link)
[nestaa@helix LIONS]$ 

Unfortunately that quick fix didn't seem to work... are the .bt2 files generated by LIONS or are they supposed to be downloaded and placed in $RESOURCES/genome/ ?

ababaian commented 5 years ago

The initial start-up scripts will check for a bowtie2 index .bt2 and generate it if it doesn't exist. when you go into the LIONS/projects/projName/mcf10a folder are the bowtie2 index files present there? Can you ls -alh *in that directory?

If you cd into that directory, the command tophat2 command should be runnable with the command printed out.

cmd: /projects/beck-lab/alex/bin/LIONS/bin/tophat2 -p 1 -r 76 --report-secondary-alignments -o /projects/beck-lab/alex/bin/LIONS/projects/MCF7vsMCF10A/mcf10a hg38 /projects/beck-lab/alex/bin/LIONS/projects/MCF7vsMCF10A/mcf10a/temp.1.fq.gz /projects/beck-lab/alex/bin/LIONS/projects/MCF7vsMCF10A/mcf10a/temp.2.fq.gz

Alex-Nesta commented 5 years ago
[nestaa@helix LIONS]$ cd projects/MCF7vsMCF10A/mcf10a/
[nestaa@helix mcf10a]$ ls
hg38.1.bt2  hg38.2.bt2  hg38.3.bt2  hg38.4.bt2  hg38.bwa.names  hg38.chr.size  hg38.fa  hg38.fa.fai  logs  temp.1.fq.gz  temp.2.fq.gz  tmp  tophat_out
[nestaa@helix mcf10a]$ ls -alh
total 6.3G
drwxr-sr-x 5 nestaa beck-lab  360 Jan 16 14:35 .
drwxr-sr-x 6 nestaa beck-lab  991 Jan 16 14:35 ..
-rw-r--r-- 2 nestaa beck-lab 938M Jan 16 14:35 hg38.1.bt2
-rw-r--r-- 2 nestaa beck-lab 700M Jan 16 14:35 hg38.2.bt2
-rw-r--r-- 2 nestaa beck-lab  11K Jan 16 14:35 hg38.3.bt2
-rw-r--r-- 2 nestaa beck-lab 700M Jan 16 14:35 hg38.4.bt2
-rw-r--r-- 2 nestaa beck-lab  16K Jan 15 11:57 hg38.bwa.names
-rw-r--r-- 2 nestaa beck-lab  12K Jan 15 11:57 hg38.chr.size
-rw-r--r-- 2 nestaa beck-lab 3.1G Jan 16  2014 hg38.fa
-rw-r--r-- 2 nestaa beck-lab  19K Jan 15 11:57 hg38.fa.fai
drwxr-sr-x 2 nestaa beck-lab   53 Jan 16 13:01 logs
lrwxrwxrwx 1 nestaa beck-lab  117 Jan 16 14:35 temp.1.fq.gz -> /projects/beck-lab/alex/Breast_Cancer/rna-seq/MCF10JGM/mcf10a_R1_001.fastq
lrwxrwxrwx 1 nestaa beck-lab  117 Jan 16 14:35 temp.2.fq.gz -> /projects/beck-lab/alex/Breast_Cancer/rna-seq/MCF10JGM/mcf10a_R2_001.fastq
drwxr-sr-x 2 nestaa beck-lab    0 Jan 15 18:34 tmp
drwxr-sr-x 4 nestaa beck-lab   43 Jan 16 13:06 tophat_out
[nestaa@helix mcf10a]$ /projects/beck-lab/alex/bin/LIONS/bin/tophat2 -p 1 -r 76 --report-secondary-alignments -o /projects/beck-lab/alex/bin/LIONS/projects/MCF7vsMCF10A/mcf10a hg38 /projects/beck-lab/alex/bin/LIONS/projects/MCF7vsMCF10A/mcf10a/temp.1.fq.gz /projects/beck-lab/alex/bin/LIONS/projects/MCF7vsMCF10A/mcf10a/temp.2.fq.gz

[2019-01-16 15:39:24] Beginning TopHat run (v2.0.13)
-----------------------------------------------
[2019-01-16 15:39:24] Checking for Bowtie
          Bowtie version:    2.3.1.0
[2019-01-16 15:39:24] Checking for Bowtie index files (genome)..
Error: Could not find Bowtie 2 index files (hg38.*.bt2)
[nestaa@helix mcf10a]$ 

hmm, doesn't seem to be permissions related....

[nestaa@helix mcf10a]$ chmod 775 hg38.*
[nestaa@helix mcf10a]$ ls
hg38.1.bt2  hg38.2.bt2  hg38.3.bt2  hg38.4.bt2  hg38.bwa.names  hg38.chr.size  hg38.fa  hg38.fa.fai  logs  temp.1.fq.gz  temp.2.fq.gz  tmp  tophat_out
[nestaa@helix mcf10a]$ /projects/beck-lab/alex/bin/LIONS/bin/tophat2 -p 1 -r 76 --report-secondary-alignments -o /projects/beck-lab/alex/bin/LIONS/projects/MCF7vsMCF10A/mcf10a hg38 /projects/beck-lab/alex/bin/LIONS/projects/MCF7vsMCF10A/mcf10a/temp.1.fq.gz /projects/beck-lab/alex/bin/LIONS/projects/MCF7vsMCF10A/mcf10a/temp.2.fq.gz

[2019-01-16 15:40:46] Beginning TopHat run (v2.0.13)
-----------------------------------------------
[2019-01-16 15:40:46] Checking for Bowtie
          Bowtie version:    2.3.1.0
[2019-01-16 15:40:46] Checking for Bowtie index files (genome)..
Error: Could not find Bowtie 2 index files (hg38.*.bt2)
[nestaa@helix mcf10a]$ chmod 775 temp.*
[nestaa@helix mcf10a]$ /projects/beck-lab/alex/bin/LIONS/bin/tophat2 -p 1 -r 76 --report-secondary-alignments -o /projects/beck-lab/alex/bin/LIONS/projects/MCF7vsMCF10A/mcf10a hg38 /projects/beck-lab/alex/bin/LIONS/projects/MCF7vsMCF10A/mcf10a/temp.1.fq.gz /projects/beck-lab/alex/bin/LIONS/projects/MCF7vsMCF10A/mcf10a/temp.2.fq.gz

[2019-01-16 15:40:58] Beginning TopHat run (v2.0.13)
-----------------------------------------------
[2019-01-16 15:40:58] Checking for Bowtie
          Bowtie version:    2.3.1.0
[2019-01-16 15:40:58] Checking for Bowtie index files (genome)..
Error: Could not find Bowtie 2 index files (hg38.*.bt2)
[nestaa@helix mcf10a]$ ls -alh
total 6.3G
drwxr-sr-x 5 nestaa beck-lab  360 Jan 16 14:35 .
drwxr-sr-x 6 nestaa beck-lab  991 Jan 16 14:35 ..
-rwxrwxr-x 2 nestaa beck-lab 938M Jan 16 14:35 hg38.1.bt2
-rwxrwxr-x 2 nestaa beck-lab 700M Jan 16 14:35 hg38.2.bt2
-rwxrwxr-x 2 nestaa beck-lab  11K Jan 16 14:35 hg38.3.bt2
-rwxrwxr-x 2 nestaa beck-lab 700M Jan 16 14:35 hg38.4.bt2
-rwxrwxr-x 2 nestaa beck-lab  16K Jan 15 11:57 hg38.bwa.names
-rwxrwxr-x 2 nestaa beck-lab  12K Jan 15 11:57 hg38.chr.size
-rwxrwxr-x 2 nestaa beck-lab 3.1G Jan 16  2014 hg38.fa
-rwxrwxr-x 2 nestaa beck-lab  19K Jan 15 11:57 hg38.fa.fai
drwxr-sr-x 2 nestaa beck-lab   53 Jan 16 15:40 logs
lrwxrwxrwx 1 nestaa beck-lab  117 Jan 16 14:35 temp.1.fq.gz -> /projects/beck-lab/alex/Breast_Cancer/rna-seq/MCF10JGM/mcf10a_R1_001.fastq
lrwxrwxrwx 1 nestaa beck-lab  117 Jan 16 14:35 temp.2.fq.gz -> /projects/beck-lab/alex/Breast_Cancer/rna-seq/MCF10JGM/mcf10a_R2_001.fastq
drwxr-sr-x 2 nestaa beck-lab    0 Jan 15 18:34 tmp
drwxr-sr-x 4 nestaa beck-lab   43 Jan 16 13:06 tophat_out
[nestaa@helix mcf10a]$ 
julienrichardalbert commented 5 years ago

I believe Bowtie2 is looking for "hg38.x.bt2" within the resources directory. Maybe changing the names of your indices "GCA_000001405..." will fix the problem? I am not a developer, just a nosy user!

Alex-Nesta commented 5 years ago

I believe Bowtie2 is looking for "hg38.x.bt2" within the resources directory. Maybe changing the names of your indices "GCA_000001405..." will fix the problem? I am not a developer, just a nosy user!

Thanks for the input, I think if you take another look you'll see that I tried that and have both of those files in the directory.

So far, if you look at my most recent code block in my most recent post: it looks like the .bt2 files are in the appropriate directory and were moved there by LIONS. However, for some reason tophat doesn't see them.

[nestaa@helix logs]$ less run.log

/opt/compsci/tophat/2.0.13/bin/tophat -p 1 -r 76 --report-secondary-alignments -o /projects/beck-lab/alex/bin/LIONS/projects/MCF7vsMCF10A/mcf10a hg38 /projects/beck-lab/alex/bin/LIONS/projects/MCF7vsMCF10A/mcf10a/temp.1.fq.gz /projects/beck-lab/alex/bin/LIONS/projects/MCF7vsMCF10A/mcf10a/temp.2.fq.gz
run.log (END) 
ababaian commented 5 years ago

It looks like more recent versions of tophat2 / bowtie2 use a different index. See here

Earlier bowtie version like 2.0. series used to give bt2 indices for GRCh38 not bt2l. But if you index the same GRCh38 genome by Bowtie2 2.2.9 as used in your tophat at present, it will give .bt2l indices. Now, when you ran tophat, while verifying inputs, as per that size of the fasta file, the new version expects bt2l indices be present and consequently the error.

Can you try manually making a bowtie2 index that yields .bt2l file and running the TH2 command manually again? If so, this is fixable but will require a little bit of code.

Alex-Nesta commented 5 years ago

hey, so I manually created the bowtie index and the command worked. It did NOT make .bt2l files, just bt2, note the differences after running ls -lha. Two extra index files and they are also different sizes. I'm not totally sure what to make of this as I haven't dug into the LIONS code...

Do you have any ideas to make the lions code work with this version of bowtie/tophat? What are the EXACT program versions known to work?

command to build index:

bowtie2-build hg38.fa hg38
[nestaa@helix mcf10a]$ ls -lha
total 7.9G
drwxr-sr-x 5 nestaa beck-lab  450 Jan 16 20:23 .
drwxr-sr-x 6 nestaa beck-lab 1.1K Jan 16 15:44 ..
-rwxrwxr-x 2 nestaa beck-lab 974M Jan 16 20:22 hg38.1.bt2
-rwxrwxr-x 2 nestaa beck-lab 728M Jan 16 20:22 hg38.2.bt2
-rwxrwxr-x 2 nestaa beck-lab  15K Jan 16 19:20 hg38.3.bt2
-rwxrwxr-x 2 nestaa beck-lab 728M Jan 16 19:20 hg38.4.bt2
-rwxrwxr-x 2 nestaa beck-lab  16K Jan 15 11:57 hg38.bwa.names
-rwxrwxr-x 2 nestaa beck-lab  12K Jan 15 11:57 hg38.chr.size
-rwxrwxr-x 2 nestaa beck-lab 3.1G Jan 16  2014 hg38.fa
-rwxrwxr-x 2 nestaa beck-lab  19K Jan 15 11:57 hg38.fa.fai
-rw-r--r-- 1 nestaa beck-lab 802M Jan 16 21:22 hg38.rev.1.bt2
-rw-r--r-- 1 nestaa beck-lab 602M Jan 16 21:22 hg38.rev.2.bt2
drwxr-sr-x 2 nestaa beck-lab   53 Jan 16 21:26 logs
lrwxrwxrwx 1 nestaa beck-lab  117 Jan 16 15:44 temp.1.fq.gz -> /projects/beck-lab/alex/Breast_Cancer/rna-seq/MCF10JGM/16JGM-021-JAX-MCF10-1-GT16-05424-TGACCA_S1_R1_001.fastq
lrwxrwxrwx 1 nestaa beck-lab  117 Jan 16 15:44 temp.2.fq.gz -> /projects/beck-lab/alex/Breast_Cancer/rna-seq/MCF10JGM/16JGM-021-JAX-MCF10-1-GT16-05424-TGACCA_S1_R2_001.fastq
-rw-r--r-- 1 nestaa beck-lab  335 Jan 16 19:19 test.pbs
drwxr-sr-x 2 nestaa beck-lab   36 Jan 16 21:25 tmp
drwxr-sr-x 4 nestaa beck-lab   43 Jan 16 13:06 tophat_out
[nestaa@helix MCF7vsMCF10A]$ cd mcf10a/
[nestaa@helix mcf10a]$ ls
hg38.1.bt2  hg38.bwa.names  hg38.rev.1.bt2  temp.2.fq.gz
hg38.2.bt2  hg38.chr.size   hg38.rev.2.bt2  test.pbs
hg38.3.bt2  hg38.fa         logs            tmp
hg38.4.bt2  hg38.fa.fai     temp.1.fq.gz    tophat_out
[nestaa@helix mcf10a]$ /opt/compsci/tophat/2.0.13/bin/tophat -p 1 -r 76 --report-secondary-alignments -o /projects/beck-lab/alex/bin/LIONS/projects/MCF7vsMCF10A/mcf10a hg38 /projects/beck-lab/alex/bin/LIONS/projects/MCF7vsMCF10A/mcf10a/temp.1.fq.gz /projects/beck-lab/alex/bin/LIONS/projects/MCF7vsMCF10A/mcf10a/temp.2.fq.gz

[2019-01-16 21:25:05] Beginning TopHat run (v2.0.13)
-----------------------------------------------
[2019-01-16 21:25:05] Checking for Bowtie
          Bowtie version:    2.3.1.0
[2019-01-16 21:25:06] Checking for Bowtie index files (genome)..
[2019-01-16 21:25:06] Checking for reference FASTA file
[2019-01-16 21:25:06] Generating SAM header for hg38
biscuit13161 commented 5 years ago

Alex,

You have to run bowtie-build with the --large-index option to make .bt2l files. Otherwise, even the newest version creates .bt2 files.

Hth, Richard Thompson

On Thu, 17 Jan 2019, 05:33 Alex Nesta <notifications@github.com wrote:

hey, so I manually created the bowtie index and the command worked. It did NOT make .bt2l files, just bt2, note the differences after running ls -lha. Two extra index files and they are also different sizes. I'm not totally sure what to make of this as I haven't dug into the LIONS code...

Do you have any ideas to make the lions code work with this version of bowtie/tophat? What are the EXACT program versions known to work?

command to build index:

bowtie2-build hg38.fa hg38

[nestaa@helix mcf10a]$ ls -lha total 7.9G drwxr-sr-x 5 nestaa beck-lab 450 Jan 16 20:23 . drwxr-sr-x 6 nestaa beck-lab 1.1K Jan 16 15:44 .. -rwxrwxr-x 2 nestaa beck-lab 974M Jan 16 20:22 hg38.1.bt2 -rwxrwxr-x 2 nestaa beck-lab 728M Jan 16 20:22 hg38.2.bt2 -rwxrwxr-x 2 nestaa beck-lab 15K Jan 16 19:20 hg38.3.bt2 -rwxrwxr-x 2 nestaa beck-lab 728M Jan 16 19:20 hg38.4.bt2 -rwxrwxr-x 2 nestaa beck-lab 16K Jan 15 11:57 hg38.bwa.names -rwxrwxr-x 2 nestaa beck-lab 12K Jan 15 11:57 hg38.chr.size -rwxrwxr-x 2 nestaa beck-lab 3.1G Jan 16 2014 hg38.fa -rwxrwxr-x 2 nestaa beck-lab 19K Jan 15 11:57 hg38.fa.fai -rw-r--r-- 1 nestaa beck-lab 802M Jan 16 21:22 hg38.rev.1.bt2 -rw-r--r-- 1 nestaa beck-lab 602M Jan 16 21:22 hg38.rev.2.bt2 drwxr-sr-x 2 nestaa beck-lab 53 Jan 16 21:26 logs lrwxrwxrwx 1 nestaa beck-lab 117 Jan 16 15:44 temp.1.fq.gz -> /projects/beck-lab/alex/Breast_Cancer/rna-seq/MCF10JGM/16JGM-021-Banchereau-MCF10-1-GT16-05424-TGACCA_S1_R1_001.fastq lrwxrwxrwx 1 nestaa beck-lab 117 Jan 16 15:44 temp.2.fq.gz -> /projects/beck-lab/alex/Breast_Cancer/rna-seq/MCF10JGM/16JGM-021-Banchereau-MCF10-1-GT16-05424-TGACCA_S1_R2_001.fastq -rw-r--r-- 1 nestaa beck-lab 335 Jan 16 19:19 test.pbs drwxr-sr-x 2 nestaa beck-lab 36 Jan 16 21:25 tmp drwxr-sr-x 4 nestaa beck-lab 43 Jan 16 13:06 tophat_out

[nestaa@helix MCF7vsMCF10A]$ cd mcf10a/ [nestaa@helix mcf10a]$ ls hg38.1.bt2 hg38.bwa.names hg38.rev.1.bt2 temp.2.fq.gz hg38.2.bt2 hg38.chr.size hg38.rev.2.bt2 test.pbs hg38.3.bt2 hg38.fa logs tmp hg38.4.bt2 hg38.fa.fai temp.1.fq.gz tophat_out [nestaa@helix mcf10a]$ /opt/compsci/tophat/2.0.13/bin/tophat -p 1 -r 76 --report-secondary-alignments -o /projects/beck-lab/alex/bin/LIONS/projects/MCF7vsMCF10A/mcf10a hg38 /projects/beck-lab/alex/bin/LIONS/projects/MCF7vsMCF10A/mcf10a/temp.1.fq.gz /projects/beck-lab/alex/bin/LIONS/projects/MCF7vsMCF10A/mcf10a/temp.2.fq.gz

[2019-01-16 21:25:05] Beginning TopHat run (v2.0.13)

[2019-01-16 21:25:05] Checking for Bowtie Bowtie version: 2.3.1.0 [2019-01-16 21:25:06] Checking for Bowtie index files (genome).. [2019-01-16 21:25:06] Checking for reference FASTA file [2019-01-16 21:25:06] Generating SAM header for hg38

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/ababaian/LIONS/issues/12#issuecomment-455020126, or mute the thread https://github.com/notifications/unsubscribe-auth/AARLf-RGKz7FON2Mho-bgCFqA9xfFrf_ks5vD-EQgaJpZM4aDeXE .

ababaian commented 5 years ago

Here is me:

artem@glitch[artem] tophat2 --version                                                     [12:38PM]
TopHat v2.1.1
artem@glitch[artem] bowtie2 --version                                                     [12:38PM]
/usr/bin/bowtie2-align-s version 2.3.4.1
64-bit
Alex-Nesta commented 5 years ago

Ok, I think I figured out what is going on here:

Issue 1:

Error: Could not find Bowtie 2 index files (hg38.*.bt2)

Looks like the index was not generated completely the first LIONS run (my ssh client disconnected from the cluster when I walked away), and I think LIONS was unable to recognize the incomplete indexes and tried to proceed, causing the missing index error.

After manual generation of the (bt2) index files, this issue is solved. Maybe there is a way LIONS can more thoroughly check the integrity of the index files.

Issue 2:

Error: Could not find Bowtie 2 index files (hg38.*.bt2l)

bt2l indexes

After I upgraded my bowtie and tophat to match ababaian's above, LIONS started asking for bt2l, even though it generated just bt2 index files. I believe this IS a bug that needs some fixing.

Issue 3: Now that that is all sorted, I am sad to report yet another issue. Please see below.

[2019-01-17 19:54:57] Checking for Bowtie
                  Bowtie version:        2.3.4.1
[2019-01-17 19:54:57] Checking for Bowtie index files (genome)..
[2019-01-17 19:54:57] Checking for reference FASTA file
[2019-01-17 19:54:57] Generating SAM header for hg38
Traceback (most recent call last):
  File "/projects/beck-lab/alex/bin/tophat-2.1.1.Linux_x86_64/tophat", line 4107, in <module>
    sys.exit(main())
  File "/projects/beck-lab/alex/bin/tophat-2.1.1.Linux_x86_64/tophat", line 3961, in main
    params.read_params = check_reads_format(params, reads_list)
  File "/projects/beck-lab/alex/bin/tophat-2.1.1.Linux_x86_64/tophat", line 1859, in check_reads_format
    freader=FastxReader(zf.file, params.read_params.color, zf.fname)
  File "/projects/beck-lab/alex/bin/tophat-2.1.1.Linux_x86_64/tophat", line 1599, in __init__
    while hlines>0 and self.lastline[0] not in "@>" :
IndexError: string index out of range
du: cannot access `accepted_hits.bam': No such file or directory
/projects/beck-lab/alex/bin/LIONS/scripts/eastLion.sh: line 195: [: -ge: unary operator expected

I believe this is may be an issue with tophat and not LIONS, but if you have any ideas please let me know.

I think it might be because LIONS is linking to non-gz fastq files as gz files. But I need to test this. See below:

[nestaa@helix mcf10a]$ ls -lha
total 7.9G
drwxr-sr-x 5 nestaa beck-lab  450 Jan 16 20:23 .
drwxr-sr-x 6 nestaa beck-lab 1.1K Jan 16 15:44 ..
-rwxrwxr-x 2 nestaa beck-lab 974M Jan 16 20:22 hg38.1.bt2
-rwxrwxr-x 2 nestaa beck-lab 728M Jan 16 20:22 hg38.2.bt2
-rwxrwxr-x 2 nestaa beck-lab  15K Jan 16 19:20 hg38.3.bt2
-rwxrwxr-x 2 nestaa beck-lab 728M Jan 16 19:20 hg38.4.bt2
-rwxrwxr-x 2 nestaa beck-lab  16K Jan 15 11:57 hg38.bwa.names
-rwxrwxr-x 2 nestaa beck-lab  12K Jan 15 11:57 hg38.chr.size
-rwxrwxr-x 2 nestaa beck-lab 3.1G Jan 16  2014 hg38.fa
-rwxrwxr-x 2 nestaa beck-lab  19K Jan 15 11:57 hg38.fa.fai
-rw-r--r-- 1 nestaa beck-lab 802M Jan 16 21:22 hg38.rev.1.bt2
-rw-r--r-- 1 nestaa beck-lab 602M Jan 16 21:22 hg38.rev.2.bt2
drwxr-sr-x 2 nestaa beck-lab   53 Jan 16 21:26 logs
lrwxrwxrwx 1 nestaa beck-lab  117 Jan 16 15:44 temp.1.fq.gz -> /projects/beck-lab/alex/Breast_Cancer/rna-seq/MCF10JGM/16JGM-021-JAX-MCF10-1-GT16-05424-TGACCA_S1_R1_001.fastq
lrwxrwxrwx 1 nestaa beck-lab  117 Jan 16 15:44 temp.2.fq.gz -> /projects/beck-lab/alex/Breast_Cancer/rna-seq/MCF10JGM/16JGM-021-JAX-MCF10-1-GT16-05424-TGACCA_S1_R2_001.fastq
-rw-r--r-- 1 nestaa beck-lab  335 Jan 16 19:19 test.pbs
drwxr-sr-x 2 nestaa beck-lab   36 Jan 16 21:25 tmp
drwxr-sr-x 4 nestaa beck-lab   43 Jan 16 13:06 tophat_out

EDIT: yes, I think the lack of gz compression on the fastq was the issue... Will update tomorrow AM.

ababaian commented 5 years ago

Issue 1: I've changed the internal check to look for the last index file rev.2 instead of the first one to avoid this if possible.

Issue 2: bt2l files are automatically generated for reference genomes >4Gb. I never had this issue before but it's reasonable. LIONS now will deal with bt2 and bt2l files accordingly. Thanks for the bug fix suggestion =D

Issue 3: You're a caveman to not compress your fastq files (joking). This looks like a slightly less trivial fix script wise. May I kindly suggest you zip your fastq files and call it a day? I'll need more time to re-write the eastLion.sh to deal with both gz and non-gz fastqs. @biscuit13161 do you see a quick fix in eastLion?

Alex-Nesta commented 5 years ago

After recent submissions to repo, this issue is solved. Thanks for all of your help.