abramovitchMSU / SPARTA_Docs_and_Tutorial

6 stars 4 forks source link

.sam files have zero bytes when created with SPARTA - example Dataset #1

Open mmelendrez opened 6 years ago

mmelendrez commented 6 years ago

macOSX - Sierra 10.12.6

Everything appears to be fine. Moving on.

Creating a folder with which all the generated data analysis will be placed. Default subfolder location will be in 'RNAseq_Data' which is located on the Desktop Is the RNAseq data in a folder on the Desktop? (Y or N):Y What is the name of the folder on the Desktop containing the RNAseq data?:ExampleData

* TrimmomaticSE appeared to run correctly:

What is the name of the folder on the Desktop containing the RNAseq data?:ExampleData TrimmomaticSE: Started with arguments: -threads 2 /Users/mel_local/Desktop/ExampleData/gly5a.fq.gz /Users/mel_local/Desktop/RNAseq_Data/2018-03-02/QC/trimmedgly5a.fq.gz ILLUMINACLIP:/Users/mel_local/Desktop/SPARTA_Mac-master/QC_analysis/Trimmomatic-0.33/adapters/TruSeq3-SE.fa:2:30:10 LEADING:3 TRAILING:3 SLIDINGWINDOW:4:15 MINLEN:36 Using Long Clipping Sequence: 'AGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGTA' Using Long Clipping Sequence: 'AGATCGGAAGAGCACACGTCTGAACTCCAGTCAC' ILLUMINACLIP: Using 0 prefix pairs, 2 forward/reverse sequences, 0 forward only sequences, 0 reverse only sequences Quality encoding detected as phred33 Input Reads: 100000 Surviving: 97349 (97.35%) Dropped: 2651 (2.65%) TrimmomaticSE: Completed successfully TrimmomaticSE: Started with arguments: -threads 2 /Users/mel_local/Desktop/ExampleData/gly5b.fq.gz /Users/mel_local/Desktop/RNAseq_Data/2018-03-02/QC/trimmedgly5b.fq.gz ILLUMINACLIP:/Users/mel_local/Desktop/SPARTA_Mac-master/QC_analysis/Trimmomatic-0.33/adapters/TruSeq3-SE.fa:2:30:10 LEADING:3 TRAILING:3 SLIDINGWINDOW:4:15 MINLEN:36 Using Long Clipping Sequence: 'AGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGTA' Using Long Clipping Sequence: 'AGATCGGAAGAGCACACGTCTGAACTCCAGTCAC' ILLUMINACLIP: Using 0 prefix pairs, 2 forward/reverse sequences, 0 forward only sequences, 0 reverse only sequences Quality encoding detected as phred33 Input Reads: 100000 Surviving: 97391 (97.39%) Dropped: 2609 (2.61%) TrimmomaticSE: Completed successfully TrimmomaticSE: Started with arguments: -threads 2 /Users/mel_local/Desktop/ExampleData/gly7a.fq.gz /Users/mel_local/Desktop/RNAseq_Data/2018-03-02/QC/trimmedgly7a.fq.gz ILLUMINACLIP:/Users/mel_local/Desktop/SPARTA_Mac-master/QC_analysis/Trimmomatic-0.33/adapters/TruSeq3-SE.fa:2:30:10 LEADING:3 TRAILING:3 SLIDINGWINDOW:4:15 MINLEN:36 Using Long Clipping Sequence: 'AGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGTA' Using Long Clipping Sequence: 'AGATCGGAAGAGCACACGTCTGAACTCCAGTCAC' ILLUMINACLIP: Using 0 prefix pairs, 2 forward/reverse sequences, 0 forward only sequences, 0 reverse only sequences Quality encoding detected as phred33 Input Reads: 100000 Surviving: 96867 (96.87%) Dropped: 3133 (3.13%) TrimmomaticSE: Completed successfully TrimmomaticSE: Started with arguments: -threads 2 /Users/mel_local/Desktop/ExampleData/gly7b.fq.gz /Users/mel_local/Desktop/RNAseq_Data/2018-03-02/QC/trimmedgly7b.fq.gz ILLUMINACLIP:/Users/mel_local/Desktop/SPARTA_Mac-master/QC_analysis/Trimmomatic-0.33/adapters/TruSeq3-SE.fa:2:30:10 LEADING:3 TRAILING:3 SLIDINGWINDOW:4:15 MINLEN:36 Using Long Clipping Sequence: 'AGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGTA' Using Long Clipping Sequence: 'AGATCGGAAGAGCACACGTCTGAACTCCAGTCAC' ILLUMINACLIP: Using 0 prefix pairs, 2 forward/reverse sequences, 0 forward only sequences, 0 reverse only sequences Quality encoding detected as phred33 Input Reads: 100000 Surviving: 96608 (96.61%) Dropped: 3392 (3.39%) TrimmomaticSE: Completed successfully TrimmomaticSE: Started with arguments: -threads 2 /Users/mel_local/Desktop/ExampleData/pyr5a.fq.gz /Users/mel_local/Desktop/RNAseq_Data/2018-03-02/QC/trimmedpyr5a.fq.gz ILLUMINACLIP:/Users/mel_local/Desktop/SPARTA_Mac-master/QC_analysis/Trimmomatic-0.33/adapters/TruSeq3-SE.fa:2:30:10 LEADING:3 TRAILING:3 SLIDINGWINDOW:4:15 MINLEN:36 Using Long Clipping Sequence: 'AGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGTA' Using Long Clipping Sequence: 'AGATCGGAAGAGCACACGTCTGAACTCCAGTCAC' ILLUMINACLIP: Using 0 prefix pairs, 2 forward/reverse sequences, 0 forward only sequences, 0 reverse only sequences Quality encoding detected as phred33 Input Reads: 100000 Surviving: 96615 (96.61%) Dropped: 3385 (3.38%) TrimmomaticSE: Completed successfully TrimmomaticSE: Started with arguments: -threads 2 /Users/mel_local/Desktop/ExampleData/pyr5b.fq.gz /Users/mel_local/Desktop/RNAseq_Data/2018-03-02/QC/trimmedpyr5b.fq.gz ILLUMINACLIP:/Users/mel_local/Desktop/SPARTA_Mac-master/QC_analysis/Trimmomatic-0.33/adapters/TruSeq3-SE.fa:2:30:10 LEADING:3 TRAILING:3 SLIDINGWINDOW:4:15 MINLEN:36 Using Long Clipping Sequence: 'AGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGTA' Using Long Clipping Sequence: 'AGATCGGAAGAGCACACGTCTGAACTCCAGTCAC' ILLUMINACLIP: Using 0 prefix pairs, 2 forward/reverse sequences, 0 forward only sequences, 0 reverse only sequences Quality encoding detected as phred33 Input Reads: 100000 Surviving: 96647 (96.65%) Dropped: 3353 (3.35%) TrimmomaticSE: Completed successfully TrimmomaticSE: Started with arguments: -threads 2 /Users/mel_local/Desktop/ExampleData/pyr7a.fq.gz /Users/mel_local/Desktop/RNAseq_Data/2018-03-02/QC/trimmedpyr7a.fq.gz ILLUMINACLIP:/Users/mel_local/Desktop/SPARTA_Mac-master/QC_analysis/Trimmomatic-0.33/adapters/TruSeq3-SE.fa:2:30:10 LEADING:3 TRAILING:3 SLIDINGWINDOW:4:15 MINLEN:36 Using Long Clipping Sequence: 'AGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGTA' Using Long Clipping Sequence: 'AGATCGGAAGAGCACACGTCTGAACTCCAGTCAC' ILLUMINACLIP: Using 0 prefix pairs, 2 forward/reverse sequences, 0 forward only sequences, 0 reverse only sequences Quality encoding detected as phred33 Input Reads: 100000 Surviving: 96201 (96.20%) Dropped: 3799 (3.80%) TrimmomaticSE: Completed successfully TrimmomaticSE: Started with arguments: -threads 2 /Users/mel_local/Desktop/ExampleData/pyr7b.fq.gz /Users/mel_local/Desktop/RNAseq_Data/2018-03-02/QC/trimmedpyr7b.fq.gz ILLUMINACLIP:/Users/mel_local/Desktop/SPARTA_Mac-master/QC_analysis/Trimmomatic-0.33/adapters/TruSeq3-SE.fa:2:30:10 LEADING:3 TRAILING:3 SLIDINGWINDOW:4:15 MINLEN:36 Using Long Clipping Sequence: 'AGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGTA' Using Long Clipping Sequence: 'AGATCGGAAGAGCACACGTCTGAACTCCAGTCAC' ILLUMINACLIP: Using 0 prefix pairs, 2 forward/reverse sequences, 0 forward only sequences, 0 reverse only sequences Quality encoding detected as phred33 Input Reads: 100000 Surviving: 96345 (96.34%) Dropped: 3655 (3.65%) TrimmomaticSE: Completed successfully FastQC is assessing your data set for overall quality

* FastQC, Bowtie and HTSeq appeared to run correctly with the except that when it got to HTseq it mentioned ```bad interpreter``` in the output as it ran for several files:

FastQC is assessing your data set for overall quality Building the Bowtie index from the reference genome Mapping reads to the reference genome with Bowtie Counting gene features with HTSeq /bin/sh: ./htseq-count: /Library/Frameworks/Python.framework/Versions/2.7/Resources/Python.app/Content: bad interpreter: No such file or directory /bin/sh: ./htseq-count: /Library/Frameworks/Python.framework/Versions/2.7/Resources/Python.app/Content: bad interpreter: No such file or directory /bin/sh: ./htseq-count: /Library/Frameworks/Python.framework/Versions/2.7/Resources/Python.app/Content: bad interpreter: No such file or directory /bin/sh: ./htseq-count: /Library/Frameworks/Python.framework/Versions/2.7/Resources/Python.app/Content: bad interpreter: No such file or directory /bin/sh: ./htseq-count: /Library/Frameworks/Python.framework/Versions/2.7/Resources/Python.app/Content: bad interpreter: No such file or directory /bin/sh: ./htseq-count: /Library/Frameworks/Python.framework/Versions/2.7/Resources/Python.app/Content: bad interpreter: No such file or directory /bin/sh: ./htseq-count: /Library/Frameworks/Python.framework/Versions/2.7/Resources/Python.app/Content: bad interpreter: No such file or directory /bin/sh: ./htseq-count: /Library/Frameworks/Python.framework/Versions/2.7/Resources/Python.app/Content: bad interpreter: No such file or directory SPARTA has these files: 1) mapgly5a.sam 2) mapgly5b.sam 3) mapgly7a.sam 4) mapgly7b.sam 5) mappyr5a.sam 6) mappyr5b.sam 7) mappyr7a.sam 8) mappyr7b.sam

* 4 conditions were entered per the example data set workflow instructions

How many conditions are there?:4 Are you sure that's how many conditions you would like to compare? (y/n):y Now we need to edit a text file to specify which files belong to a given condition and which files are replicates for each condition. The file names that you need to use are listed above under 'SPARTA has these files'. The text file you need to edit (NOT WITH MICROSOFT WORD) using a text editor like TextEdit, is on your Desktop in RNAseq_Data -> date of the current run -> DEanalysis -> conditions_input.txt Enter the relevant file names, with replicates separated by a comma. As an example, please see the 'conditions_input_example.txt' in the DEanalysis folder. Once you have entered the file names, hit Enter/Return:

* I fixed the input file (which by the way will not work if there are spaces - in your tutorial it looks like there are spaces between colon and filename for the first line and between comma separated files.
My file:

Reference_Condition_Files:mapgly7a.sam,mapgly7b.sam Experimental_Condition_2_Files:mapgly5a.sam,mapgly5b.sam Experimental_Condition_3_Files:mappyr7a.sam,mappyr7b.sam Experimental_Condition_4_Files:mappyr5a.sam,mappyr5b.sam

* I hit enter...

Once you have entered the file names, hit Enter/Return: trying URL 'https://bioconductor.org/packages/3.2/bioc/bin/macosx/mavericks/contrib/3.2/BiocInstaller_1.20.3.tgz' Content type 'application/x-gzip' length 53075 bytes (51 KB)

downloaded 51 KB

The downloaded binary packages are in /var/folders/12/2j8hq03s52lbnstw5wh008k80000gq/T//RtmpQ1SlNz/downloaded_packages Bioconductor version 3.2 (BiocInstaller 1.20.3), ?biocLite for help A new version of Bioconductor is available after installing the most recent version of R; see http://bioconductor.org/install BioC_mirror: https://bioconductor.org Using Bioconductor 3.2 (BiocInstaller 1.20.3), R 3.2.3 (2015-12-10). Installing package(s) ‘edgeR’ also installing the dependency ‘limma’

trying URL 'https://bioconductor.org/packages/3.2/bioc/bin/macosx/mavericks/contrib/3.2/limma_3.26.9.tgz' Content type 'application/x-gzip' length 2060369 bytes (2.0 MB)

downloaded 2.0 MB

trying URL 'https://bioconductor.org/packages/3.2/bioc/bin/macosx/mavericks/contrib/3.2/edgeR_3.12.1.tgz' Content type 'application/x-gzip' length 1521484 bytes (1.5 MB)

downloaded 1.5 MB

The downloaded binary packages are in /var/folders/12/2j8hq03s52lbnstw5wh008k80000gq/T//RtmpQ1SlNz/downloaded_packages Old packages: 'boot', 'cluster', 'codetools', 'foreign', 'lattice', 'MASS', 'Matrix', 'mgcv', 'nnet', 'rpart', 'survival' Loading required package: limma Warning messages: 1: package ‘edgeR’ was built under R version 3.2.4 2: package ‘limma’ was built under R version 3.2.4 Error in read.table("/Users/mel_local/Desktop/RNAseq_Data/2018-03-02/DEanalysis/mapgly7a.sam", : no lines available in input Execution halted Analysis complete. Thank you for using SPARTA.

* I checked why it failed on mapgly7a.sam. When I opened the sam file in VIM there was nothing in it. I checked the original downloaded fq.gz files for the example data.

wsb255bioimac27:SPARTA_Mac-master mel_local$ ll ~/Desktop/ExampleData/ total 87608 -rwxr-xr-x@ 1 mel_local staff 2.6M Mar 15 2016 alignMtbCDC1551.gtf -rwxr-xr-x@ 1 mel_local staff 3.7M Mar 15 2016 gly5a.fq.gz -rwxr-xr-x@ 1 mel_local staff 3.7M Mar 15 2016 gly5b.fq.gz -rwxr-xr-x@ 1 mel_local staff 4.0M Mar 15 2016 gly7a.fq.gz -rwxr-xr-x@ 1 mel_local staff 4.2M Mar 15 2016 gly7b.fq.gz -rwxr-xr-x@ 1 mel_local staff 4.0M Mar 15 2016 pyr5a.fq.gz -rwxr-xr-x@ 1 mel_local staff 4.0M Mar 15 2016 pyr5b.fq.gz -rwxr-xr-x@ 1 mel_local staff 4.0M Mar 15 2016 pyr7a.fq.gz -rwxr-xr-x@ 1 mel_local staff 4.1M Mar 15 2016 pyr7b.fq.gz -rwxr-xr-x@ 1 mel_local staff 4.3M Mar 15 2016 trimmedMtbCDC1551.fa -rwxr-xr-x@ 1 mel_local staff 4.2M Mar 15 2016 trimmedMtbCDC1551.fna

They looked small to me but they are also probably subsets and gzipped.
* I then checked the .sam files:

-rw-r--r-- 1 mel_local staff 8.6K Mar 2 09:59 DEexpression.r -rw-r--r--@ 1 mel_local staff 223B Mar 2 09:59 conditions_input.txt -rw-r--r-- 1 mel_local staff 159B Mar 2 09:55 conditions_input_example.txt -rw-r--r-- 1 mel_local staff 0B Mar 2 09:54 mapgly5a.sam -rw-r--r-- 1 mel_local staff 0B Mar 2 09:54 mapgly5b.sam -rw-r--r-- 1 mel_local staff 0B Mar 2 09:54 mapgly7a.sam -rw-r--r-- 1 mel_local staff 0B Mar 2 09:54 mapgly7b.sam -rw-r--r-- 1 mel_local staff 0B Mar 2 09:54 mappyr5a.sam -rw-r--r-- 1 mel_local staff 0B Mar 2 09:54 mappyr5b.sam -rw-r--r-- 1 mel_local staff 0B Mar 2 09:54 mappyr7a.sam -rw-r--r-- 1 mel_local staff 0B Mar 2 09:54 mappyr7b.sam


* No bytes. So the bowtie and/or indexing step in SPARTA did not work.

Please advise? I am not using El Capitan so I did not install anything beyond what the tutorial requested.
Thank you.
MKrasSt commented 6 years ago

having this issue too!

gjiang06 commented 6 years ago

Same problem for me!