TreesLab / NCLscan

We have developed a new pipeline, NCLscan, which is rather advantageous in the identification of both intragenic and intergenic "non-co-linear" (NCL) transcripts (fusion, trans-splicing, and circular RNA) from paired-end RNA-seq data.
MIT License
6 stars 9 forks source link

Error in NCLScan Run #8

Closed onkarnath89 closed 5 years ago

onkarnath89 commented 8 years ago

I recently installed this tool for circular Rna detection But it is producing error. I am working with a plant sample. thus I prepared the gtf file from gff file (which was obtained from NCBI) using gffread. the error I am getting is as below: . . . . . Read 0 genes, 0 transcripts and 138234 exons from the gtf file.

novoalign (V3.04.06 - Build May 18 2016 @ 16:23:46) - A short read aligner with qualities.

(C) 2008-2015 Novocraft Technologies Sdn Bhd.

License file: /home/mjlabscis/Softwares/novocraft/novoalign.lic

Licensed to Jawaharlal Nehru University

novoalign -r A 1 -t 0,1 -d output_tmp/tmp_NCLscan.JS2.ndx -f output_tmp/tmp_NCLscan.main.unmapped_1.fastq o

utput_tmp/tmp_NCLscan.main.unmapped_2.fastq --3Prime -o SAM

Starting at Fri Jul 29 10:25:03 2016

Interpreting input files as Illumina FASTQ, Casava Pipeline 1.3 to 1.7.

Index Build Version: 3.4

Hash length: 6

Step size: 1

Paired Reads: 661919

Proper Pairs: 87 ( 0.0%)

Read Sequences: 1323838

Unique Alignment: 643 ( 0.0%)

Multi Mapped: 21 ( 0.0%)

No Mapping Found: 1323072 (99.9%)

QC Failures...

Homopolymer Filter: 102 ( 0.0%)

Elapsed Time: 16.659 (secs.)

CPU Time: 11.18 (min.)

Fragment Length Distribution

From To Count

105 119 2

120 134 2

135 149 4

150 164 6

165 179 7

180 194 2

195 209 2

210 224 3

225 239 6

240 254 9

255 269 10

270 284 8

285 299 15

300 314 4

315 329 0

330 344 3

345 359 1

360 374 0

375 389 0

390 404 3

Mean 244, Std Dev 65.1

Done at Fri Jul 29 10:25:20 2016

Start to split the input into 16 part ... Time cost of split_file = 0.00453305244446 sec Start to do the mp_blat using 16 processes ... Time cost of mp_blat = 11.9849720001 sec Merging results ... Time cost of merge_result = 0.000910997390747 sec Start to split the input into 16 part ... Time cost of split_file = 0.00453400611877 sec Start to do the mp_blat using 16 processes ... Time cost of mp_blat = 1.68700289726 sec Merging results ... Time cost of merge_result = 0.00147986412048 sec Start to split the input into 16 part ... Time cost of split_file = 0.00458097457886 sec Start to do the mp_blat using 16 processes ... Time cost of mp_blat = 0.192430019379 sec Merging results ... Time cost of merge_result = 0.00123000144958 sec Start to split the input into 16 part ... Time cost of split_file = 0.00454807281494 sec Start to do the mp_blat using 16 processes ... Time cost of mp_blat = 0.0459880828857 sec Merging results ... Time cost of merge_result = 0.00110507011414 sec Traceback (most recent call last): File "/home/mjlabscis/Softwares/NCLscan-1.6/bin/get_gene_name.py", line 91, in add_gene_name(args.result_tmp_file, args.gene_anno, args.output) File "/home/mjlabscis/Softwares/NCLscan-1.6/bin/get_gene_name.py", line 26, in add_gene_name result_tmp_data_with_gene_name.append(line_list + [','.join(gene_name_1), ','.join(gene_name_2), isIntragenic]) TypeError: sequence item 0: expected string, NoneType found Traceback (most recent call last): File "./NCLscan.py", line 448, in NCL_Scan4(config, datasets_list, args.project_name, args.output_dir) File "./NCLscan.py", line 255, in NCL_Scan4 final_tmp = read_TSV("{prefix}.result.tmp3".format(**config_options)) File "./NCLscan.py", line 279, in read_TSV with open(tsv_file) as data_reader: IOError: [Errno 2] No such file or directory: 'output_tmp/tmp_NCLscan.result.tmp3'

I had run tophat and another pipeline working on tophat output, produced no issues and I got a large number of circular Rnas. But I also need to check it using this tool. Thus Kindly help me out.