We have developed a new pipeline, NCLscan, which is rather advantageous in the identification of both intragenic and intergenic "non-co-linear" (NCL) transcripts (fusion, trans-splicing, and circular RNA) from paired-end RNA-seq data.
/data/Ziegelbauer_lab/tools/NCLscan-1.7.0/NCLscan.py -c /vf/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.8.x/samples_3_032223/nclscan.config -o test -pj iSLK-BAC16_Uninduced_R2 --fq1 /vf/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.8.x/samples_3_032223/results/iSLK-BAC16_Uninduced_R2/trim/iSLK-BAC16_Uninduced_R2.R1.trim.fastq.gz --fq2 /vf/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.8.x/samples_3_032223/results/iSLK-BAC16_Uninduced_R2/trim/iSLK-BAC16_Uninduced_R2.R2.trim.fastq.gz
There are no main datasets assigned.
Any idea? Here is the config file
#############################
### NCLscan Configuration ###
## The directory of references and indices
## The script "create_reference.py" would create the needed references and indices here.
NCLscan_ref_dir = /vf/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.8.x/samples_3_032223/ref/NCLscan_index
## The following four reference files can be downloaded from the GENCODE website (http://www.gencodegenes.org/).
## The reference genome sequence, eg. /path/to/GRCh37.p13.genome.fa
Reference_genome = /vf/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.8.x/samples_3_032223/ref/ref.fa
## The gene annotation file, eg. /path/to/gencode.v19.annotation.gtf
Gene_annotation = /vf/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.8.x/samples_3_032223/ref/ref.fixed.gtf
## The protein-coding transcript sequences, eg. /path/to/gencode.v19.pc_transcripts.fa
Protein_coding_transcripts = /vf/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.8.x/samples_3_032223/ref/ref.transcripts.fa
## The long non-coding RNA transcript sequences, eg. /path/to/gencode.v19.lncRNA_transcripts.fa
lncRNA_transcripts = /vf/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.8.x/samples_3_032223/ref/ref.dummy.fa
## External tools
## these are set to "module load" on BIOWULF
bedtools_bin = /usr/local/apps/bedtools/2.29.0/bin/bedtools
blat_bin = /usr/local/apps/blat/3.5/blat
bwa_bin = /usr/local/apps/bwa/0.7.17/bwa
samtools_bin = /usr/local/apps/samtools/1.15.1/bin/samtools
novoalign_bin = /usr/local/apps/novocraft/4.03.05/novoalign
novoindex_bin = /usr/local/apps/novocraft/4.03.05/novoindex
## Bin
NCLscan_bin = {NCLscan_dir}/bin
Add_read_count_bin = {NCLscan_bin}/Add_read_count.py
AssembleExons_bin = {NCLscan_bin}/AssembleExons
AssembleFastq_bin = {NCLscan_bin}/AssembleFastq
AssembleJSeq_bin = {NCLscan_bin}/AssembleJSeq.py
append_Z3_tag = {NCLscan_bin}/append_Z3_tag.py
FastqOut_bin = {NCLscan_bin}/FastqOut
get_gene_name_bin = {NCLscan_bin}/get_gene_name.py
GetInfo_bin = {NCLscan_bin}/GetInfo
GetKey_bin = {NCLscan_bin}/GetKey
GetNameB4Dot_bin = {NCLscan_bin}/GetNameB4Dot
InsertInList_bin = {NCLscan_bin}/InsertInList
JSFilter_bin = {NCLscan_bin}/JSFilter
JSParser_bin = {NCLscan_bin}/JSParser
JunctionSite2BED_bin = {NCLscan_bin}/JunctionSite2BED
mp_blat_bin = {NCLscan_bin}/mp_blat.py
PslChimeraFilter_bin = {NCLscan_bin}/PslChimeraFilter
RemoveInList_bin = {NCLscan_bin}/RemoveInList
RetainInList_bin = {NCLscan_bin}/RetainInList
RmBadMapping_bin = {NCLscan_bin}/RmBadMapping
RmColinearPairInSam_bin = {NCLscan_bin}/RmColinearPairInSam
RmRedundance_bin = {NCLscan_bin}/RmRedundance
SeqOut_bin = {NCLscan_bin}/SeqOut
###########################
### Advanced parameters ###
###########################
## The following two parameters indicate the maximal read length (L) and fragment size of the used paired-end RNA-seq data (FASTQ files), where fragment size = 2L + insert size.
## If L > 151, the users should change these two parameters to (L, 2L + insert size).
max_read_len = 151
max_fragment_size = 500
## The base quality threshold. The value should be a non-negative integer.
quality_score = 20
## The collection of the supporting reads must span the NCL junction boundary by the setting size of span range on both sides of the junction site.
span_range = 50
###################
### Performance ###
###################
## Parameters for bwa mem
## The number of threads
bwa-mem-t = 56
## Parameters for mp_blat.py
## The number of processes for running blat
##
## NOTE: The memory usage of each blat process would be up to 4 GB!
##
mp_blat_process = 56
I am getting this error
Any idea? Here is the config file