biocoder / Perl-for-Bioinformatics

An attempt to help anyone interested in using Perl for Bioinformatics
Other
34 stars 31 forks source link

cuffcomapre can't find transcript.gtf files #9

Closed yeroslaviz closed 6 years ago

yeroslaviz commented 6 years ago

Hi,

I am having troubles executing the test run. I have installed everything and it was complete without errors. I have all the tools installed.

The first problem I encountered was that tophat is an older version, so I had to change the links to tophat2 (I created a softlink for tophat to point to tophat2).

the problem I have is with the cuffcompare part of the analysis.

this is a copy of the log file from the cuffcompare part of the run:

Mon Feb 12 15:27:41 2018    Validating options...
Mon Feb 12 15:27:41 2018    Starting ☲☴ lncRNApipe Pipeline...
Mon Feb 12 15:27:41 2018    ########################### Module 1: Running cuffcompare... #########################################

Making output directory for cuffcompare [ /home/yeroslaviz/projects/MarcoN.RNASeq.GLD2/lncRNApipe//lncRNApipe-test/run/cuffcompare ]

Command call:
-------------
/usr/local/bin/cuffcompare -T -o /home/yeroslaviz/projects/MarcoN.RNASeq.GLD2/lncRNApipe//lncRNApipe-test/run/cuffcompare/lncRNApipe_cuffcmp -r /home/yeroslaviz/projects/MarcoN.RNASeq.GLD2/lncRNApipe//lncRNApipe-test/genes.gtf -s /home/yeroslaviz/projects/MarcoN.RNASeq.GLD2/lncRNApipe//lncRNApipe-test/genome.fa /home/yeroslaviz/projects/MarcoN.RNASeq.GLD2/lncRNApipe//lncRNApipe-test/run/cufflinks/run_jJMDrZlC1UCY.2cells/transcripts.gtf /home/yeroslaviz/projects/MarcoN.RNASeq.GLD2/lncRNApipe//lncRNApipe-test/run/cufflinks/run_jJMDrZlC1UCY.6h/transcripts.gtf 

Error: cannot locate input file: /home/yeroslaviz/projects/MarcoN.RNASeq.GLD2/lncRNApipe//lncRNApipe-test/run/cufflinks/run_jJMDrZlC1UCY.6h/transcripts.gtf

INFO!!
--------
Using local FASTA reference [ genome.fa ] to fetch sequences to maintain consistency.
** Requires Bio::SeqIO module to be installed and available **

Mon Feb 12 15:27:41 2018    ########################### Module 2: Running categorize_ncRNAs.pl ###################################

Cannot find Cuffcompare tracking [ /home/yeroslaviz/projects/MarcoN.RNASeq.GLD2/lncRNApipe//lncRNApipe-test/run/cuffcompare/lncRNApipe_cuffcmp.tracking ] file...

ERROR

Mon Feb 12 15:27:41 2018    ☲☴ lncRNApipe Pipeline aborted(?)

Mon Feb 12 15:27:41 2018    Validating options...

Mon Feb 12 15:27:41 2018    Starting ☲☴ lncRNApipe Pipeline...

Mon Feb 12 15:27:41 2018    ########################### Module 1: Running cuffcompare... #########################################

Making output directory for cuffcompare [ /home/yeroslaviz/projects/MarcoN.RNASeq.GLD2/lncRNApipe//lncRNApipe-test/run/cuffdiff_known_ncRNAs/cuffcompare ]

Command call:
-------------
/usr/local/bin/cuffcompare -T -o /home/yeroslaviz/projects/MarcoN.RNASeq.GLD2/lncRNApipe//lncRNApipe-test/run/cuffdiff_known_ncRNAs/cuffcompare/lncRNApipe_cuffcmp -r /home/yeroslaviz/projects/MarcoN.RNASeq.GLD2/lncRNApipe//lncRNApipe-test/zf_noncode2016.gtf -s /home/yeroslaviz/projects/MarcoN.RNASeq.GLD2/lncRNApipe//lncRNApipe-test/genome.fa /home/yeroslaviz/projects/MarcoN.RNASeq.GLD2/lncRNApipe//lncRNApipe-test/run/cufflinks/run_jJMDrZlC1UCY.2cells/transcripts.gtf /home/yeroslaviz/projects/MarcoN.RNASeq.GLD2/lncRNApipe//lncRNApipe-test/run/cufflinks/run_jJMDrZlC1UCY.6h/transcripts.gtf 

Error: cannot locate input file: /home/yeroslaviz/projects/MarcoN.RNASeq.GLD2/lncRNApipe//lncRNApipe-test/run/cufflinks/run_jJMDrZlC1UCY.6h/transcripts.gtf

INFO!!
--------
Using local FASTA reference [ genome.fa ] to fetch sequences to maintain consistency.
** Requires Bio::SeqIO module to be installed and available **

Mon Feb 12 15:27:41 2018    ########################### Module 2: Running categorize_ncRNAs.pl ###################################

Cannot find Cuffcompare tracking [ /home/yeroslaviz/projects/MarcoN.RNASeq.GLD2/lncRNApipe//lncRNApipe-test/run/cuffdiff_known_ncRNAs/cuffcompare/lncRNApipe_cuffcmp.tracking ] file...

ERROR
Mon Feb 12 15:27:41 2018    ☲☴ lncRNApipe Pipeline aborted(?)

but when I look for thfiles I can see they are there:

yeroslaviz@cotopaxi:lncRNApipe$ ll /home/yeroslaviz/projects/MarcoN.RNASeq.GLD2/lncRNApipe//lncRNApipe-test/run/cufflinks/run_jJMDrZlC1UCY.6h/transcripts.gtf
-rw-rw-r-- 1 yeroslaviz yeroslaviz 3588058 Feb 12 15:27 /home/yeroslaviz/projects/MarcoN.RNASeq.GLD2/lncRNApipe//lncRNApipe-test/run/cufflinks/run_jJMDrZlC1UCY.6h/transcripts.gtf

Here is also a copy of the output I get when I run the tool:

perl lncRNApipe --conf lncRNApipe-test/params.test.yaml

Mon Feb 12 15:54:13 2018        Validating options...

Mon Feb 12 15:54:13 2018        Starting ☲☴ lncRNApipe Pipeline...

Mon Feb 12 15:54:13 2018        Preparing output directories...

Mon Feb 12 15:54:13 2018        Creating a GTF file with gene_biotype which is not "protein_coding"...

WARNING!
--------
[performTranscriptAssembly: YES] requested, but found [cuffcompare: ] options.
[cuffcompare: ] options are not required when you are performing transcript assembly with lncRNApipe.

Mon Feb 12 15:54:13 2018        Preparing job scripts...

Mon Feb 12 15:54:13 2018        Detected PE / MP reads...

Mon Feb 12 15:54:13 2018        Job script [ lncRNApipe_alignment_stage_run_jJMDrZlC1UCY.2cells_rep_1.sh  ] written...

Mon Feb 12 15:54:13 2018        No scheduler specified. Running it in background with bash...

@ -------------------------------------------------------------------- @

Command call:
-------------
bash /home/yeroslaviz/projects/MarcoN.RNASeq.GLD2/lncRNApipe//lncRNApipe-test/run/job_scripts.Mon_Feb_12_15_54_13_2018/lncRNApipe_alignment_stage_run_jJMDrZlC1UCY.2cells_rep_1.sh &

@ -------------------------------------------------------------------- @

Mon Feb 12 15:54:13 2018        Waiting for 1 more job ID(s) in [ job_IDs_tophat.jJMDrZlC1UCY ]...

Mon Feb 12 15:54:23 2018        Job script [ lncRNApipe_assembly_stage_run_jJMDrZlC1UCY.2cells.sh  ] written...

Mon Feb 12 15:54:23 2018        No scheduler specified. Running it in background with bash...

@ -------------------------------------------------------------------- @

Command call:
-------------
bash /home/yeroslaviz/projects/MarcoN.RNASeq.GLD2/lncRNApipe//lncRNApipe-test/run/job_scripts.Mon_Feb_12_15_54_13_2018/lncRNApipe_assembly_stage_run_jJMDrZlC1UCY.2cells.sh &

@ -------------------------------------------------------------------- @

Mon Feb 12 15:54:23 2018        Job script [ lncRNApipe_alignment_stage_run_jJMDrZlC1UCY.6h_rep_1.sh  ] written...

Mon Feb 12 15:54:23 2018        No scheduler specified. Running it in background with bash...

@ -------------------------------------------------------------------- @

Command call:
-------------
bash /home/yeroslaviz/projects/MarcoN.RNASeq.GLD2/lncRNApipe//lncRNApipe-test/run/job_scripts.Mon_Feb_12_15_54_13_2018/lncRNApipe_alignment_stage_run_jJMDrZlC1UCY.6h_rep_1.sh &

@ -------------------------------------------------------------------- @

Mon Feb 12 15:54:23 2018        Waiting for 1 more job ID(s) in [ job_IDs_tophat.jJMDrZlC1UCY ]...

Mon Feb 12 15:54:33 2018        Job script [ lncRNApipe_assembly_stage_run_jJMDrZlC1UCY.6h.sh  ] written...

Mon Feb 12 15:54:33 2018        No scheduler specified. Running it in background with bash...

@ -------------------------------------------------------------------- @

Command call:
-------------
bash /home/yeroslaviz/projects/MarcoN.RNASeq.GLD2/lncRNApipe//lncRNApipe-test/run/job_scripts.Mon_Feb_12_15_54_13_2018/lncRNApipe_assembly_stage_run_jJMDrZlC1UCY.6h.sh &

@ -------------------------------------------------------------------- @

Mon Feb 12 15:54:33 2018        Waiting for 1 more job ID(s) in [ job_IDs_cufflinks.jJMDrZlC1UCY ]...

Mon Feb 12 15:54:43 2018        Job script [ lncRNApipe_postAssembly.run_jJMDrZlC1UCY.sh  ] written...

Mon Feb 12 15:54:43 2018        No scheduler specified. Running it in background with bash...

@ -------------------------------------------------------------------- @

Command call:
-------------
bash /home/yeroslaviz/projects/MarcoN.RNASeq.GLD2/lncRNApipe//lncRNApipe-test/run/job_scripts.Mon_Feb_12_15_54_13_2018/lncRNApipe_postAssembly.run_jJMDrZlC1UCY.sh &

@ -------------------------------------------------------------------- @

Mon Feb 12 15:54:43 2018        cuffdiff run on all cufflinks' transcripts requested...

Mon Feb 12 15:54:43 2018        Generating separate job script using the same job dependency chain...

Mon Feb 12 15:54:43 2018        Job script [ lncRNApipe_postAssembly_diffExp.run_jJMDrZlC1UCY.sh  ] written...

Mon Feb 12 15:54:43 2018        No scheduler specified. Running it in background with bash...

@ -------------------------------------------------------------------- @

Command call:
-------------
bash /home/yeroslaviz/projects/MarcoN.RNASeq.GLD2/lncRNApipe//lncRNApipe-test/run/job_scripts.Mon_Feb_12_15_54_13_2018/lncRNApipe_postAssembly_diffExp.run_jJMDrZlC1UCY.sh &

@ -------------------------------------------------------------------- @

Mon Feb 12 15:54:43 2018        All job scripts have been submitted. Please check these log files for any results / errors.

/home/yeroslaviz/projects/MarcoN.RNASeq.GLD2/lncRNApipe//lncRNApipe-test/run/trimmomatic/run_jJMDrZlC1UCY.2cells_rep_1/run.log
/home/yeroslaviz/projects/MarcoN.RNASeq.GLD2/lncRNApipe//lncRNApipe-test/run/tophat/run_jJMDrZlC1UCY.2cells_rep_1/run.log
/home/yeroslaviz/projects/MarcoN.RNASeq.GLD2/lncRNApipe//lncRNApipe-test/run/cufflinks/run_jJMDrZlC1UCY.2cells/run.log
/home/yeroslaviz/projects/MarcoN.RNASeq.GLD2/lncRNApipe//lncRNApipe-test/run/trimmomatic/run_jJMDrZlC1UCY.6h_rep_1/run.log
/home/yeroslaviz/projects/MarcoN.RNASeq.GLD2/lncRNApipe//lncRNApipe-test/run/tophat/run_jJMDrZlC1UCY.6h_rep_1/run.log
/home/yeroslaviz/projects/MarcoN.RNASeq.GLD2/lncRNApipe//lncRNApipe-test/run/cufflinks/run_jJMDrZlC1UCY.6h/run.log
/home/yeroslaviz/projects/MarcoN.RNASeq.GLD2/lncRNApipe//lncRNApipe-test/run/lncRNApipe.postAssembly.run_jJMDrZlC1UCY.log
/home/yeroslaviz/projects/MarcoN.RNASeq.GLD2/lncRNApipe//lncRNApipe-test/run/cuffmerge_known_ncRNAs/cuffmerge_known_ncRNAs.run_jJMDrZlC1UCY.log
/home/yeroslaviz/projects/MarcoN.RNASeq.GLD2/lncRNApipe//lncRNApipe-test/run/cuffmerge_novel_ncRNAs/cuffmerge_novel_ncRNAs.run_jJMDrZlC1UCY.log
/home/yeroslaviz/projects/MarcoN.RNASeq.GLD2/lncRNApipe//lncRNApipe-test/run/cuffdiff_known_ncRNAs/cuffdiff_on_known.run_jJMDrZlC1UCY.log
/home/yeroslaviz/projects/MarcoN.RNASeq.GLD2/lncRNApipe//lncRNApipe-test/run/cuffdiff_novel_ncRNAs/cuffdiff_on_novel.run_jJMDrZlC1UCY.log
/home/yeroslaviz/projects/MarcoN.RNASeq.GLD2/lncRNApipe//lncRNApipe-test/run/cuffdiff_on_transcripts/cuffmerge_on_cufflinks_transcripts.run_jJMDrZlC1UCY.log
/home/yeroslaviz/projects/MarcoN.RNASeq.GLD2/lncRNApipe//lncRNApipe-test/run/cuffdiff_on_transcripts/cuffdiff_on_all_transcripts.run_jJMDrZlC1UCY.log

At the bottom I also attached a copy of the param.yaml file for the evaluation of the prameters. In the cuffcompare part I have tried both with and without the -i parameter, but also when it is commented out, it still doesn't run. I am not sure about the -i option. Do I need to create in advance this assembly_list.txt file, os it being creating by the script when needed? Anyway, this script doesn't exits in my folder. Do I need to create it manually?

I hope this repository is still maintained. I am sorry to say, that even sending the report is not working.

Thanks for the help

Assa

---
# This is the configuration file, where you can specify the command line
# options for each of the modules of lncRNApipe. If you do not wish to
# run any of the modules, simply, disable them and all the lines following
# it (if, any) by prefixing it with #
#
# 
#
# Specify output directory.
# In the example below, a  new output directory called "run"
# will be created at the mentioned path  /home/yeroslaviz/projects/MarcoN.RNASeq.GLD2/lncRNApipe//lncRNApipe-test.
#
outputDir: /home/yeroslaviz/projects/MarcoN.RNASeq.GLD2/lncRNApipe//lncRNApipe-test/run

#
# Indicate, if we should overwrite the output directory if it already exists.
#
# WARNING: THIS WILL REMOVE ANY FILES / DIRECTORIES 
#
overwriteOutputDir: NO

#
# Let lncRNApipe know if you intend to perform transcript assembly
# using tophat / cufflinks or just identify ncRNAs from already
# assembled transcripts.
#
performTranscriptAssembly: YES

#
# Specify number of threads / CPUs to use where possible.
# If running on a grid, each job script will run with this many
# number of CPUs.
#
# ********************* !! IMPORTANT !! ***********************
# *************************************************************
#
# Please be aware that each command will take this many number
# of CPUs. If you have 2 samples, then 2 tophat runs will take
# 10 CPUs each etc... In case of grid computing, this is easily
# managed via job parameters but if you not using grid computing, 
# please make sure you are not exceeding the total number of CPUs
# available on your machine. 
#
# For example, if you are not using grid computing and your  machine 
# has 24 CPUs, and you have 6 samples including all replicates, 
# you may want to use as rule of thumb: 
#
# Desired number of CPUs = Total number of CPUs divided by total
# number of samples (including replicates, if any)
# 
CPUs: 16

#
# Specify scheduler type.
# Valid options are PBS, SGE, LSF or NONE to disable grid computing.
#
#scheduler: SGE
scheduler: NONE

#
# Mention batch submission command.
# For example, for LSF, it is bsub, for PBS or SGE, it
# is qsub and in case you are not running jobs on the cluster, change
# it to BASH.
#
# batchSubCmd: qsub
batchSubCmd: BASH

#
# Specify scheduler options on separate line
# specific to your job running environment.
#
# ********************* !! IMPORTANT !! ***********************
# *************************************************************
#
# Different clusters uses different job parameter names. For
# example our cluster uses num_threads as parameter name to
# request number of CPUs in SGE. Some clusters may use
# num_cpu. Since it is difficult to guess, please provide
# number of CPUs you want to use based on your grid environment
# below, which is equal to "CPUs:" option above. Yes, we know that
# this is kind of redundant but it is necessary evil. What ever
# job parameters you provide after the - , they will appear
# exactly in the job script.
#
# IF IT IS NOT PROVIDED, THE JOB WILL RUN ON SINGLE CPU.
#
# *************************************************************
# *************************************************************
#
# Ex for PBS (job parameters start with a PBS directive):
#
# PBS -V
# PBS -l nodes=1:ppn=16,walltime=24:00:00
#
# Ex for SGE (job parameters start with $ sign):
#
# $ -N lncRNApipe
# $ -l num_threads=4
#
schedulerOpts:
# - $ -l mem_free=10G
 - $ -l num_threads=16

#clusterOpts:
# - module load lncRNApipe
#
# Specify if your reads are in FASTA or FASTQ format.
#
readType: FASTQ

#
# Specify if your reads are SE (single-end only), PE (paired-end only) or MIXED
#
libType: MIXED

#
# Specify if you want to use ENSEMBL or UCSC as source for annotation.
# "PLEASE MAKE SURE" that you are using the same sourceDB for tophat
# alignment, i.e. bowtie indexes created from the same sourceDB for the
# same assembly version. As downloding genome indices on-the-fly is time
# consuming and since many users may have genome indices installed for
# various NGS analysis anyways, we leave it to you to provide "CORRECT"
# genome index below in tophat and cufflinks options' configuration.
#
sourceDB: ENSEMBL

#
# Choose assembly version so that we can download up-to-date annotation on the fly.
# You can choose to supply your own annotation file for consistency below.
# ENSEMBL and UCSC represents species name differently in the URLs.
# To view species names for ENSEMBL do, "perl lncRNApipe --list ENSEMBL".
# Go to https://genome.ucsc.edu/FAQ/FAQreleases.html#TOP to view UCSC version names.
# For example, at UCSC, for mouse, it is mm9 or mm10, for rat, it is rn5 or rn6 etc...
#
species: Danio_rerio.Zv9.66

#
# If you want to use your own annoation file, provide full path to the annotation
# file of your choice. This option overrides "sourceDB: " and "species: " options above.
# Again, "PLEASE MAKE SURE", you are providing the same genome index from the
# same source. In the example below, "bowtieGenomeIndex" is rn4 genome index created
# from FASTA files from UCSC.
#
useThisAnnotation: /home/yeroslaviz/projects/MarcoN.RNASeq.GLD2/lncRNApipe//lncRNApipe-test/genes.gtf

#
# Provide Unix path to directory where the read files are located and also
# provide read file names.
# If your data is just SE, then disable "r2: " below by prefixing it with a #.
# Separate replicates of sample by a comma. Separate different samples by a |.
#
# PLEASE MAKE SURE THE READS ARE IN ORDER. DO NOT MIX AND MATCH READ ORDERS.
# WE HAVE NO WAY OF KNOWING WHICH READ FILE IS _1 or _2.
#
reads:
 readsDir: /home/yeroslaviz/projects/MarcoN.RNASeq.GLD2/lncRNApipe//lncRNApipe-test/reads
 r1: 2cells_1_25000_reads.fq|6h_1_25000_reads.fq
 r2: 2cells_2_25000_reads.fq|6h_2_25000_reads.fq

#
# If you want to use TRIMMOMATIC to trim reads, then provide
# TRIMMOMATIC options. TRIMMOMATIC provides the following adapter
# files: NexteraPE-PE.fa, TruSeq2-PE.fa, TruSeq2-SE.fa, TruSeq3-PE-2.fa,
# TruSeq3-PE.fa and TruSeq3-SE.fa. Choose one below or provide full path
# to the adapter sequence file you want to use.
#
# No need to provide -threads, as it will be handled by lncRNApipe.
#
trimmomatic:
 - ILLUMINACLIP:TruSeq3-PE.fa:2:30:10
 - LEADING:20
 - TRAILING:20
 - SLIDINGWINDOW:5:20
 - MINLEN:25

#
# Provide bowtie genome index, full path to genome multi FASTA and tophat options.
# "PLEASE MAKE SURE" that you are using the same genome indices for the sourceDB
# mentioned above. Do not use UCSC genome indices if you requested ENSEMBL above in
# "sourceDB: " or vice versa.
#
# In the example below, it is assumed that you have already created a transcriptome
# index. If you have not created one, go through tophat manual on how to just create
# transcriptome index (https://ccb.jhu.edu/software/tophat/manual.shtml). If you
# use "--trancriptome-index" below and the index does not exist, it will keep overwriting
# while the jobs are running.
#
# No need to provide FASTQ files, as they will be handled by lncRNApipe.
# No need to provide -o, as it will be handled by lncRNApipe.
# No need to provide -p, as it will be handled by lncRNApipe.
#
genomeFasta: /home/yeroslaviz/projects/MarcoN.RNASeq.GLD2/lncRNApipe//lncRNApipe-test/genome.fa
bowtieGenomeIndex: /home/yeroslaviz/projects/MarcoN.RNASeq.GLD2/lncRNApipe//lncRNApipe-test/Danio_rerio.Zv9.66.dna
tophat:
 - --b2-sensitive
 - --transcriptome-index=/home/yeroslaviz/projects/MarcoN.RNASeq.GLD2/lncRNApipe//lncRNApipe-test/Danio_rerio.Zv9.66
 - --no-coverage-search

#
# Provide cufflinks options and also "PLEASE MAKE SURE" you provide the same reference
# FASTA for the sourceDB you want to use in case of bias correction.
#
# No need to provide -g, as it will be handled by lncRNApipe.
# No need to provide -o, as it will be handled by lncRNApipe.
# No need to provide -p, as it will be handled by lncRNApipe.
# No need to provide input files, as they will be handled by lncRNApipe.
#
cufflinks:
 - -u
 - -b /home/yeroslaviz/projects/MarcoN.RNASeq.GLD2/lncRNApipe//lncRNApipe-test/genome.fa
 - -g /home/yeroslaviz/projects/MarcoN.RNASeq.GLD2/lncRNApipe//lncRNApipe-test/genes.gtf

#
# Provide cuffcompare options.
#
# ******************************** !! IMPORTANT !! ******************************
# *******************************************************************************
#
# If you only want to run lncRNApipe without the transcript assembly stage, then
# provide assembled transcripts in GTF format, otherwise, no options are needed.
#
# *******************************************************************************
# *******************************************************************************
#
# No need to provide -r, as it will be handled by lncRNApipe.
# Provide "-i transcript_assembly_list.txt" if you did not run assembly stage,
# where transcript_assembly_list.txt contains full path to assembled transcript
# files.
#
cuffcompare:
 - -i /home/yeroslaviz/projects/MarcoN.RNASeq.GLD2/lncRNApipe//lncRNApipe-test/assembly_list.txt

#
# Provide categorize_ncRNAs.pl options
# See "perl lncRNApipe -h cat" for description of options.
# 
categorize_ncRNAs:
 - -sample-names "2cells,6h"
 - -len 200
 - -min-exons 1
 - -ov 80
 - -inc
 - -ignore-genePred-err

#
# Provide get_unique_features.pl options.
# Generally only "-ov" is required in either
# case of [performTranscriptAssembly: YES] or
# [performTranscriptAssembly: NO].
#
# When you supply your own known ncRNAs file to
# compare against with "-sf", then you MUST also
# specify it's file format (Ex: gtf or bed) with
# "-sff" option.
# 
# See "perl lncRNApipe --h get" for description of options.
#
get_unique_features:
 - -ov 80
 - -sf /home/yeroslaviz/projects/MarcoN.RNASeq.GLD2/lncRNApipe//lncRNApipe-test/zf_noncode2016.gtf
 - -sff gtf

#
# By, default, RNAfold is not run since it is very slow.
# It is generally recommended to generate RNAfold plots
# based on the transcript of your interest after you have
# investigated the results, but you can still enable it
# in the pipeline by uncommenting "runRNAfold: YES" to
# run RNAfold with default options.
#
# To pass command line options to RNAfold, define it's
# after line "RNAfold:" 
#
# Mention any other option other than "-p" and "--noPS" as they
# are automatically handled by lncRNApipe.
#
# See "perl lncRNApipe -h rna" for description of options.
#
runRNAfold: YES
RNAfold:
# - --circ

#
# Provide options to cmscan. Running cmscan with
# default options provides good matches in
# most cases, but in any case you want to add
# extra options, do it here.
#
# To run cmscan with default options, use
# "runcmscan: YES".
#
# Add additional options you want to pass to cmscan
# after line "cmscan:". 
# Provide any options other than "-o", "--tblout"
# and "--cpu" as they are automatically handled by 
# lncRNApipe.
#
# See "perl lncRNApipe -h inf" for description of options.
#
runcmscan: YES
cmscan:
# - -E 9.0

#
# Provide cuffmerge options.
# If you have replicates, final predicted lncRNAs will be merged.
#
# No need to provide -s, as it will be handled by lncRNApipe.
# No need to provide -g, as it will be handled by lncRNApipe.
#
cuffmerge:
 - --min-isoform-fraction 0.05

#
# Provide cuffdiff options if you want to run differential expression
# tests between known ncRNAs and between novel ncRNAs in your samples.
# 
# Normally, cuffdiff is only run on identified known and novel ncRNAs.
# If you want to run cuffdiff on all transcripts, i.e to identify
# differentially expressed transcripts (end of typical tuxedo pipeline),
# then change "runCuffdiffForAllTranscripts" to YES. 
#
# If you have mentioned "-sample-names " above in "categorize_ncRNAs: ",
# no need to mention -L, else provide -L option here.
#
# No need to provide -o, as it will be handled by lncRNApipe.
# No need to provide -p, as it will be handled by lncRNApipe.
# No need to provide input files, if you have run tophat, cufflinks above.
#
# If "performTranscriptAssembly" is NO, then provide your own BAM files here prefixed by
# -bam option.
# Separate replicate BAM files with comma as you generally do with cuffdiff command.
#
runCuffdiffForAllTranscripts: YES
cuffdiff:
 - -u
 - -b /home/yeroslaviz/projects/MarcoN.RNASeq.GLD2/lncRNApipe//lncRNApipe-test/genome.fa
biocoder commented 6 years ago

Yes. Please comment out these lines:

cuffcompare:
 - -i /home/yeroslaviz/projects/MarcoN.RNASeq.GLD2/lncRNApipe//lncRNApipe-test/assembly_list.txt

Change overwriteOutputDir: NO to overwriteOutputDir: YES.

Also, please make sure your $SHELL is /bin/bash. See here: https://github.com/biocoder/Perl-for-Bioinformatics/tree/master/NGS-Utils#caveats

If after following these steps, it is still failing, try executing the cuffcompare command with -V option to see if cuffcompare prints any extra debug information.

/usr/local/bin/cuffcompare -V -T -o /home/yeroslaviz/projects/MarcoN.RNASeq.GLD2/lncRNApipe//lncRNApipe-test/run/cuffcompare/lncRNApipe_cuffcmp -r /home/yeroslaviz/projects/MarcoN.RNASeq.GLD2/lncRNApipe//lncRNApipe-test/genes.gtf -s /home/yeroslaviz/projects/MarcoN.RNASeq.GLD2/lncRNApipe//lncRNApipe-test/genome.fa /home/yeroslaviz/projects/MarcoN.RNASeq.GLD2/lncRNApipe//lncRNApipe-test/run/cufflinks/run_jJMDrZlC1UCY.2cells/transcripts.gtf /home/yeroslaviz/projects/MarcoN.RNASeq.GLD2/lncRNApipe//lncRNApipe-test/run/cufflinks/run_jJMDrZlC1UCY.6h/transcripts.gtf 

If none of the suggestions above work, please let me know.

Cheers.

yeroslaviz commented 6 years ago

Hi, thanks for the fast answer. Unfortunately this didn't work. the two lines are commented out and the overwrite is active. I have also checked for the bash/dash problem.

There are 2 choices for the alternative sh (providing /bin/sh).                                                                                                                                                                                              
  Selection    Path            Priority   Status
------------------------------------------------------------
  0            /bin/dash        200       auto mode
* 1            /bin/bash        100       manual mode
  2            /bin/dash        200       manual mode

But when I am running the script again, it goes all the way, but doesn't manage to run cuffcompare.

This is the output of the log file from the postAssembly step:

cat lncRNApipe/lncRNApipe-test/run/lncRNApipe.postAssembly.run_Gvj2dFgCL1ZI.log 

Tue Feb 13 10:05:47 2018        Validating options...

Tue Feb 13 10:05:47 2018        Starting ☲☴ lncRNApipe Pipeline...

Tue Feb 13 10:05:47 2018        ########################### Module 1: Running cuffcompare... #########################################

Making output directory for cuffcompare [ /home/yeroslaviz/projects/MarcoN.RNASeq.GLD2/lncRNApipe//lncRNApipe-test/run/cuffcompare ]

Command call:
-------------
/usr/local/bin/cuffcompare -T -o /home/yeroslaviz/projects/MarcoN.RNASeq.GLD2/lncRNApipe//lncRNApipe-test/run/cuffcompare/lncRNApipe_cuffcmp -r /home/yeroslaviz/projects/MarcoN.RNASeq.GLD2/lncRNApipe//lncRNApipe-test/genes.gtf -s /home/yeroslaviz/projects/MarcoN.RNASeq.GLD2/lncRNApipe//lncRNApipe-test/genome.fa /home/yeroslaviz/projects/MarcoN.RNASeq.GLD2/lncRNApipe//lncRNApipe-test/run/cufflinks/run_Gvj2dFgCL1ZI.2cells/transcripts.gtf /home/yeroslaviz/projects/MarcoN.RNASeq.GLD2/lncRNApipe//lncRNApipe-test/run/cufflinks/run_Gvj2dFgCL1ZI.6h/transcripts.gtf 

Error: cannot locate input file: /home/yeroslaviz/projects/MarcoN.RNASeq.GLD2/lncRNApipe//lncRNApipe-test/run/cufflinks/run_Gvj2dFgCL1ZI.6h/transcripts.gtf

INFO!!
--------
Using local FASTA reference [ genome.fa ] to fetch sequences to maintain consistency.
** Requires Bio::SeqIO module to be installed and available **

Tue Feb 13 10:05:47 2018        ########################### Module 2: Running categorize_ncRNAs.pl ###################################

Cannot find Cuffcompare tracking [ /home/yeroslaviz/projects/MarcoN.RNASeq.GLD2/lncRNApipe//lncRNApipe-test/run/cuffcompare/lncRNApipe_cuffcmp.tracking ] file...

ERROR

Tue Feb 13 10:05:47 2018        ☲☴ lncRNApipe Pipeline aborted(?)

Tue Feb 13 10:05:47 2018        Validating options...

Tue Feb 13 10:05:47 2018        Starting ☲☴ lncRNApipe Pipeline...

Tue Feb 13 10:05:47 2018        ########################### Module 1: Running cuffcompare... #########################################

Making output directory for cuffcompare [ /home/yeroslaviz/projects/MarcoN.RNASeq.GLD2/lncRNApipe//lncRNApipe-test/run/cuffdiff_known_ncRNAs/cuffcompare ]

Command call:
-------------
/usr/local/bin/cuffcompare -T -o /home/yeroslaviz/projects/MarcoN.RNASeq.GLD2/lncRNApipe//lncRNApipe-test/run/cuffdiff_known_ncRNAs/cuffcompare/lncRNApipe_cuffcmp -r /home/yeroslaviz/projects/MarcoN.RNASeq.GLD2/lncRNApipe//lncRNApipe-test/zf_noncode2016.gtf -s /home/yeroslaviz/projects/MarcoN.RNASeq.GLD2/lncRNApipe//lncRNApipe-test/genome.fa /home/yeroslaviz/projects/MarcoN.RNASeq.GLD2/lncRNApipe//lncRNApipe-test/run/cufflinks/run_Gvj2dFgCL1ZI.2cells/transcripts.gtf /home/yeroslaviz/projects/MarcoN.RNASeq.GLD2/lncRNApipe//lncRNApipe-test/run/cufflinks/run_Gvj2dFgCL1ZI.6h/transcripts.gtf 

Error: cannot locate input file: /home/yeroslaviz/projects/MarcoN.RNASeq.GLD2/lncRNApipe//lncRNApipe-test/run/cufflinks/run_Gvj2dFgCL1ZI.6h/transcripts.gtf

INFO!!
--------
Using local FASTA reference [ genome.fa ] to fetch sequences to maintain consistency.
** Requires Bio::SeqIO module to be installed and available **

Tue Feb 13 10:05:47 2018        ########################### Module 2: Running categorize_ncRNAs.pl ###################################

Cannot find Cuffcompare tracking [ /home/yeroslaviz/projects/MarcoN.RNASeq.GLD2/lncRNApipe//lncRNApipe-test/run/cuffdiff_known_ncRNAs/cuffcompare/lncRNApipe_cuffcmp.tracking ] file...

ERROR

Tue Feb 13 10:05:47 2018        ☲☴ lncRNApipe Pipeline aborted(?)

but the files are there:

ll /home/yeroslaviz/projects/MarcoN.RNASeq.GLD2/lncRNApipe//lncRNApipe-test/run/cufflinks/run_Gvj2dFgCL1ZI.6h/
total 4204
drwxrwxr-x 2 yeroslaviz yeroslaviz    4096 Feb 13 10:05 ./
drwxrwxr-x 4 yeroslaviz yeroslaviz    4096 Feb 13 10:05 ../
-rw-rw-r-- 1 yeroslaviz yeroslaviz   92361 Feb 13 10:06 genes.fpkm_tracking
-rw-rw-r-- 1 yeroslaviz yeroslaviz  149342 Feb 13 10:06 isoforms.fpkm_tracking
-rw-rw-r-- 1 yeroslaviz yeroslaviz  460485 Feb 13 10:06 run.log
-rw-rw-r-- 1 yeroslaviz yeroslaviz       0 Feb 13 10:05 skipped.gtf
-rw-rw-r-- 1 yeroslaviz yeroslaviz 3588043 Feb 13 10:06 transcripts.gtf

any ideas?

Thanks Assa

P.S.

When running the cuffcompare command separately, it creates the following output:

/usr/local/bin/cuffcompare -T -o tmp -r /home/yeroslaviz/projects/MarcoN.RNASeq.GLD2/lncRNApipe//lncRNApipe-test/genes.gtf -s /home/yeroslaviz/projects/MarcoN.RNASeq.GLD2/lncRNApipe//lncRNApipe-test/genome.fa /home/yeroslaviz/projects/MarcoN.RNASeq.GLD2/lncRNApipe//lncRNApipe-test/run/cufflinks/run_Gvj2dFgCL1ZI.2cells/transcripts.gtf /home/yeroslaviz/projects/MarcoN.RNASeq.GLD2/lncRNApipe//lncRNApipe-test/run/cufflinks/run_Gvj2dFgCL1ZI.6h/transcripts.gtf 
Warning: Could not connect to update server to verify current version. Please check at the Cufflinks website (http://cufflinks.cbcb.umd.edu).

yeroslaviz@cotopaxi:MarcoN.RNASeq.GLD2$ ll -rt
total 9292
...
-rw-rw-r--  1 yeroslaviz yeroslaviz  348739 Feb 13 10:14 tmp.tracking
-rw-rw-r--  1 yeroslaviz yeroslaviz  134225 Feb 13 10:14 tmp.loci
-rw-rw-r--  1 yeroslaviz yeroslaviz 3335119 Feb 13 10:14 tmp.combined.gtf
-rw-rw-r--  1 yeroslaviz yeroslaviz    2780 Feb 13 10:14 tmp.stats
yeroslaviz commented 6 years ago

There is another strange behavior I have found. Your pipeline creates scripts for each of the steps. SO i have lloked at the script created for the postAssembly step - lncRNApipe_postAssembly.run_Gvj2dFgCL1ZI.sh. In the file I can see the command it treis to execute with the cuffcompare step (line 41).

perl /local/Assa/projects/MarcoN.RNASeq.GLD2/lncRNApipe/lncRNApipe --run /home/yeroslaviz/projects/MarcoN.RNASeq.GLD2/lncRNApipe//lncRNApipe-test/run --cpu 16 --cuffcompare '-r /home/yeroslaviz/projects/MarcoN.RNASeq.GLD2/lncRNApipe//lncRNApipe-test/genes.gtf -s /home/yeroslaviz/projects/MarcoN.RNASeq.GLD2/lncRNApipe//lncRNApipe-test/genome.fa /home/yeroslaviz/projects/MarcoN.RNASeq.GLD2/lncRNApipe//lncRNApipe-test/run/cufflinks/run_Gvj2dFgCL1ZI.2cells/transcripts.gtf /home/yeroslaviz/projects/MarcoN.RNASeq.GLD2/lncRNApipe//lncRNApipe-test/run/cufflinks/run_Gvj2dFgCL1ZI.6h/transcripts.gtf ' --cat-ncRNAs '-sample-names "2cells,6h" -len 200 -min-exons 1 -ov 80 -inc -ignore-genePred-err' --get '-ov 80 -sf /home/yeroslaviz/projects/MarcoN.RNASeq.GLD2/lncRNApipe//lncRNApipe-test/zf_noncode2016.gtf -sff gtf' --fetch --cpc --inf --rna > /home/yeroslaviz/projects/MarcoN.RNASeq.GLD2/lncRNApipe//lncRNApipe-test/run/lncRNApipe.postAssembly.run_Gvj2dFgCL1ZI.log 2>&1

When I copy paste this command to the terminal it works and run ( i have stopped it at the CPC step, but it seems to run with no problems. )

Any ideas where the problem might be?

thanks Assa

biocoder commented 6 years ago

It is strange. I never encountered the error specific to your case. The Error: cannot locate input file: error is an error from cuffcompare which is installed at /usr/local/bin. Just to rule out any issues with that binary:

Can you edit the file /local/Assa/projects/MarcoN.RNASeq.GLD2/lncRNApipe/.lncRNApipe.depconf and replace /usr/local/bin/cuffcompare with /local/Assa/projects/MarcoN.RNASeq.GLD2/lncRNApipe/.lncRNApipe.depbin/linux_cuffcompare and retry?

Do you see the same error with cuffcompare then?

yeroslaviz commented 6 years ago

yes it is the same error, just the path to the cuffcompare is different.

cat /home/yeroslaviz/projects/MarcoN.RNASeq.GLD2/lncRNApipe//lncRNApipe-test/run/lncRNApipe.postAssembly.run_2FHTl2i3VOJus.log

Tue Feb 13 16:50:08 2018        Validating options...

Tue Feb 13 16:50:08 2018        Starting ☲☴ lncRNApipe Pipeline...

Tue Feb 13 16:50:08 2018        ########################### Module 1: Running cuffcompare... #########################################

Making output directory for cuffcompare [ /home/yeroslaviz/projects/MarcoN.RNASeq.GLD2/lncRNApipe//lncRNApipe-test/run/cuffcompare ]

Command call:
-------------
/local/Assa/projects/MarcoN.RNASeq.GLD2/lncRNApipe/.lncRNApipe.depbin/linux_cuffcompare -T -o /home/yeroslaviz/projects/MarcoN.RNASeq.GLD2/lncRNApipe//lncRNApipe-test/run/cuf
fcompare/lncRNApipe_cuffcmp -r /home/yeroslaviz/projects/MarcoN.RNASeq.GLD2/lncRNApipe//lncRNApipe-test/genes.gtf -s /home/yeroslaviz/projects/MarcoN.RNASeq.GLD2/lncRNApipe//
lncRNApipe-test/genome.fa /home/yeroslaviz/projects/MarcoN.RNASeq.GLD2/lncRNApipe//lncRNApipe-test/run/cufflinks/run_2FHTl2i3VOJus.2cells/transcripts.gtf /home/yeroslaviz/pro
jects/MarcoN.RNASeq.GLD2/lncRNApipe//lncRNApipe-test/run/cufflinks/run_2FHTl2i3VOJus.6h/transcripts.gtf

Error: cannot locate input file: /home/yeroslaviz/projects/MarcoN.RNASeq.GLD2/lncRNApipe//lncRNApipe-test/run/cufflinks/run_2FHTl2i3VOJus.6h/transcripts.gtf

INFO!!
--------
Using local FASTA reference [ genome.fa ] to fetch sequences to maintain consistency.
** Requires Bio::SeqIO module to be installed and available **

Tue Feb 13 16:50:08 2018        ########################### Module 2: Running categorize_ncRNAs.pl ###################################

Cannot find Cuffcompare tracking [ /home/yeroslaviz/projects/MarcoN.RNASeq.GLD2/lncRNApipe//lncRNApipe-test/run/cuffcompare/lncRNApipe_cuffcmp.tracking ] file...

ERROR

Tue Feb 13 16:50:08 2018        ☲☴ lncRNApipe Pipeline aborted(?)

Tue Feb 13 16:50:08 2018        Validating options...

Tue Feb 13 16:50:08 2018        Starting ☲☴ lncRNApipe Pipeline...

Tue Feb 13 16:50:08 2018        ########################### Module 1: Running cuffcompare... #########################################

Making output directory for cuffcompare [ /home/yeroslaviz/projects/MarcoN.RNASeq.GLD2/lncRNApipe//lncRNApipe-test/run/cuffdiff_known_ncRNAs/cuffcompare ]

Command call:
-------------
/local/Assa/projects/MarcoN.RNASeq.GLD2/lncRNApipe/.lncRNApipe.depbin/linux_cuffcompare -T -o /home/yeroslaviz/projects/MarcoN.RNASeq.GLD2/lncRNApipe//lncRNApipe-test/run/cuf
fdiff_known_ncRNAs/cuffcompare/lncRNApipe_cuffcmp -r /home/yeroslaviz/projects/MarcoN.RNASeq.GLD2/lncRNApipe//lncRNApipe-test/zf_noncode2016.gtf -s /home/yeroslaviz/projects/
MarcoN.RNASeq.GLD2/lncRNApipe//lncRNApipe-test/genome.fa /home/yeroslaviz/projects/MarcoN.RNASeq.GLD2/lncRNApipe//lncRNApipe-test/run/cufflinks/run_2FHTl2i3VOJus.2cells/trans
cripts.gtf /home/yeroslaviz/projects/MarcoN.RNASeq.GLD2/lncRNApipe//lncRNApipe-test/run/cufflinks/run_2FHTl2i3VOJus.6h/transcripts.gtf

Error: cannot locate input file: /home/yeroslaviz/projects/MarcoN.RNASeq.GLD2/lncRNApipe//lncRNApipe-test/run/cufflinks/run_2FHTl2i3VOJus.6h/transcripts.gtf

INFO!!
--------
Using local FASTA reference [ genome.fa ] to fetch sequences to maintain consistency.
** Requires Bio::SeqIO module to be installed and available **

Tue Feb 13 16:50:08 2018        ########################### Module 2: Running categorize_ncRNAs.pl ###################################

Cannot find Cuffcompare tracking [ /home/yeroslaviz/projects/MarcoN.RNASeq.GLD2/lncRNApipe//lncRNApipe-test/run/cuffdiff_known_ncRNAs/cuffcompare/lncRNApipe_cuffcmp.tracking 
] file...

ERROR

Tue Feb 13 16:50:08 2018        ☲☴ lncRNApipe Pipeline aborted(?)
yeroslaviz commented 6 years ago

The problem was that cuffcompare started working before cufflinks was done and there fore has shown the error message. This was fixed and the workflow now runs all the way through.

apparently there is still some inkorrekt notification in the las status.log file after the postAssembly step. when no novel ncRNAs are found a "bailing" notification is written to the log file and therefore a notification in the last log file says "pipeline failed".