Xinglab / isoCirc

isoCirc
GNU General Public License v3.0
10 stars 4 forks source link

Issue of not generating out file #5

Open braveagle0 opened 3 years ago

braveagle0 commented 3 years ago

I tried to run isocirc with test data. It worked great! However, when I tried to run isocirc with my own data, it did not generate isocirc.out, isocirc_stats.out or isocirc.bed. I downloaded the fa data from ensembl (http://ftp.ensembl.org/pub/release-104/fasta/homo_sapiens/dna/Homo_sapiens.GRCh38.dna.primary_assembly.fa.gz ) and the gtf file also from ensembl (http://ftp.ensembl.org/pub/release-104/gtf/homo_sapiens/Homo_sapiens.GRCh38.104.gtf.gz). The circRNA bed file was downloaded from http://circatlas.biols.ac.cn/.

The output file contains the following files: cons.fa cons.fa.sam high.bam Homo_sapiens.GRCh38.104.gtf.gene_pred TotalRNAonly.fa.len cons.fa.fai cons.info Homo_sapiens.GRCh38.104.gtf.bed low.bam trf.out

Thanks for help!

yangao07 commented 3 years ago

Do you have the log information file? That will help me find out why.

Yan

braveagle0 commented 3 years ago

[M::mm_idx_gen::628.7940.62] collected minimizers [M::mm_idx_gen::703.7370.70] sorted minimizers [M::main::703.7400.70] loaded/built the index for 194 target sequence(s) [M::mm_mapopt_update::714.0950.70] mid_occ = 765 [M::mm_idx_stat] kmer size: 15; skip: 5; is_hpc: 0; #seq: 194 [M::mm_idx_stat::716.1330.70] distinct minimizers: 167225302 (35.46% are singletons); average occurrences: 6.030; average spacing: 3.074 [M::worker_pipeline::2221.0214.31] mapped 188232 sequences [M::main] Version: 2.17-r941 [M::main] CMD: minimap2 -ax splice -ub --MD --eqx -t 8 /mnt/home2/guans/ref_genome/hg38/CircRNA_reference/GRCh38/Homo_sapiens.GRCh38.dna.primary_assembly.fa output2/cons.fa [M::main] Real time: 2221.478 sec; CPU: 9580.517 sec; Peak RSS: 20.286 GB [E::idx_find_and_load] Could not retrieve index file for 'output2/high.bam' [E::idx_find_and_load] Could not retrieve index file for 'output2/high.bam' [E::idx_find_and_load] Could not retrieve index file for 'output2/high.bam' [E::idx_find_and_load] Could not retrieve index file for 'output2/high.bam' == 16:43:39-Jul-01-2021 == [check_dependencies] Checking dependencies ... == 16:43:41-Jul-01-2021 == [check_dependencies] Checking dependencies done! == 16:43:41-Jul-01-2021 == [Tandem-Repeats-Finder] Finding tandem repeats with TRF ... == 16:43:41-Jul-01-2021 == [fxtools] fxtools sx TotalRNAonly.fa 8 /mnt/home2/guans/circRNA/Nanopore/RCA_totalRNAonly/fastq_pass/output2/ == 16:44:30-Jul-01-2021 == [Tandem Repeats Finder] trf409.legacylinux64 /mnt/home2/guans/circRNA/Nanopore/RCA_totalRNAonly/fastq_pass/output2/TotalRNAonly.fa.2 2 7 7 80 10 100 2000 -h -ngs > output2/trf.out.2; rm /mnt/home2/guans/circRNA/Nanopore/RCA_totalRNAonly/fastq_pass/output2/TotalRNAonly.fa.2 == 16:44:30-Jul-01-2021 == [Tandem Repeats Finder] trf409.legacylinux64 /mnt/home2/guans/circRNA/Nanopore/RCA_totalRNAonly/fastq_pass/output2/TotalRNAonly.fa.1 2 7 7 80 10 100 2000 -h -ngs > output2/trf.out.1; rm /mnt/home2/guans/circRNA/Nanopore/RCA_totalRNAonly/fastq_pass/output2/TotalRNAonly.fa.1 == 16:44:30-Jul-01-2021 == [Tandem Repeats Finder] trf409.legacylinux64 /mnt/home2/guans/circRNA/Nanopore/RCA_totalRNAonly/fastq_pass/output2/TotalRNAonly.fa.3 2 7 7 80 10 100 2000 -h -ngs > output2/trf.out.3; rm /mnt/home2/guans/circRNA/Nanopore/RCA_totalRNAonly/fastq_pass/output2/TotalRNAonly.fa.3 == 16:44:30-Jul-01-2021 == [Tandem Repeats Finder] trf409.legacylinux64 /mnt/home2/guans/circRNA/Nanopore/RCA_totalRNAonly/fastq_pass/output2/TotalRNAonly.fa.4 2 7 7 80 10 100 2000 -h -ngs > output2/trf.out.4; rm /mnt/home2/guans/circRNA/Nanopore/RCA_totalRNAonly/fastq_pass/output2/TotalRNAonly.fa.4 == 16:44:30-Jul-01-2021 == [Tandem Repeats Finder] trf409.legacylinux64 /mnt/home2/guans/circRNA/Nanopore/RCA_totalRNAonly/fastq_pass/output2/TotalRNAonly.fa.5 2 7 7 80 10 100 2000 -h -ngs > output2/trf.out.5; rm /mnt/home2/guans/circRNA/Nanopore/RCA_totalRNAonly/fastq_pass/output2/TotalRNAonly.fa.5 == 16:44:30-Jul-01-2021 == [Tandem Repeats Finder] trf409.legacylinux64 /mnt/home2/guans/circRNA/Nanopore/RCA_totalRNAonly/fastq_pass/output2/TotalRNAonly.fa.6 2 7 7 80 10 100 2000 -h -ngs > output2/trf.out.6; rm /mnt/home2/guans/circRNA/Nanopore/RCA_totalRNAonly/fastq_pass/output2/TotalRNAonly.fa.6 == 16:44:30-Jul-01-2021 == [Tandem Repeats Finder] trf409.legacylinux64 /mnt/home2/guans/circRNA/Nanopore/RCA_totalRNAonly/fastq_pass/output2/TotalRNAonly.fa.7 2 7 7 80 10 100 2000 -h -ngs > output2/trf.out.7; rm /mnt/home2/guans/circRNA/Nanopore/RCA_totalRNAonly/fastq_pass/output2/TotalRNAonly.fa.7 == 16:44:30-Jul-01-2021 == [Tandem Repeats Finder] trf409.legacylinux64 /mnt/home2/guans/circRNA/Nanopore/RCA_totalRNAonly/fastq_pass/output2/TotalRNAonly.fa.8 2 7 7 80 10 100 2000 -h -ngs > output2/trf.out.8; rm /mnt/home2/guans/circRNA/Nanopore/RCA_totalRNAonly/fastq_pass/output2/TotalRNAonly.fa.8 == 17:10:51-Jul-01-2021 == [Tandem Repeats Finder] cat output2/trf.out.1 >> output2/trf.out; rm output2/trf.out.1 == 17:10:59-Jul-01-2021 == [Tandem Repeats Finder] cat output2/trf.out.2 >> output2/trf.out; rm output2/trf.out.2 == 17:11:06-Jul-01-2021 == [Tandem Repeats Finder] cat output2/trf.out.3 >> output2/trf.out; rm output2/trf.out.3 == 17:11:10-Jul-01-2021 == [Tandem Repeats Finder] cat output2/trf.out.4 >> output2/trf.out; rm output2/trf.out.4 == 17:11:15-Jul-01-2021 == [Tandem Repeats Finder] cat output2/trf.out.5 >> output2/trf.out; rm output2/trf.out.5 == 17:11:18-Jul-01-2021 == [Tandem Repeats Finder] cat output2/trf.out.6 >> output2/trf.out; rm output2/trf.out.6 == 17:11:21-Jul-01-2021 == [Tandem Repeats Finder] cat output2/trf.out.7 >> output2/trf.out; rm output2/trf.out.7 == 17:11:28-Jul-01-2021 == [Tandem Repeats Finder] cat output2/trf.out.8 >> output2/trf.out; rm output2/trf.out.8 == 17:11:30-Jul-01-2021 == [fxtools] fxtools lp TotalRNAonly.fa > /mnt/home2/guans/circRNA/Nanopore/RCA_totalRNAonly/fastq_pass/output2/TotalRNAonly.fa.len 2> /dev/null == 17:15:38-Jul-01-2021 == [Tandem-Repeats-Finder] Finding tandem repeats with TRF done! == 17:15:38-Jul-01-2021 == [Mapping] Mapping consensus sequence to genome ... == 17:15:38-Jul-01-2021 == [Mapping] minimap2 -ax splice -ub --MD --eqx /mnt/home2/guans/ref_genome/hg38/CircRNA_reference/GRCh38/Homo_sapiens.GRCh38.dna.primary_assembly.fa output2/cons.fa -t 8 > output2/cons.fa.sam == 17:52:42-Jul-01-2021 == [Mapping] Mapping consensus sequence to genome done! == 17:52:42-Jul-01-2021 == [Classifying] Classifying consensus alignment ... == 17:52:42-Jul-01-2021 == [classify_bam_core] Processing output2/cons.fa.sam ... == 17:52:53-Jul-01-2021 == [classify_bam_core] 100000 BAM records done ... == 17:53:05-Jul-01-2021 == [classify_bam_core] 200000 BAM records done ... == 17:53:16-Jul-01-2021 == [classify_bam_core] 300000 BAM records done ... == 17:53:30-Jul-01-2021 == [classify_bam_core] 400000 BAM records done ... == 17:53:46-Jul-01-2021 == [classify_bam_core] 500000 BAM records done ... == 17:53:53-Jul-01-2021 == [classify_bam_core] Processing output2/cons.fa.sam done. == 17:53:53-Jul-01-2021 == [Classifying] Classifying consensus alignment done! == 17:54:06-Jul-01-2021 == [gtfToGenePred] gtfToGenePred -genePredExt -ignoreGroupsWithoutExons /mnt/home2/guans/ref_genome/hg38/CircRNA_reference/GRCh38/Homo_sapiens.GRCh38.104.gtf /mnt/home2/guans/circRNA/Nanopore/RCA_totalRNAonly/fastq_pass/output2/Homo_sapiens.GRCh38.104.gtf.gene_pred == 17:55:06-Jul-01-2021 == [genePredToBed] genePredToBed /mnt/home2/guans/circRNA/Nanopore/RCA_totalRNAonly/fastq_pass/output2/Homo_sapiens.GRCh38.104.gtf.gene_pred /mnt/home2/guans/circRNA/Nanopore/RCA_totalRNAonly/fastq_pass/output2/Homo_sapiens.GRCh38.104.gtf.bed == 17:55:08-Jul-01-2021 == [get_transcript_from_bed12] Loading transcript from /mnt/home2/guans/circRNA/Nanopore/RCA_totalRNAonly/fastq_pass/output2/Homo_sapiens.GRCh38.104.gtf.gene_pred ... == 17:55:24-Jul-01-2021 == [get_transcript_from_gene_pred] Loading transcript from /mnt/home2/guans/circRNA/Nanopore/RCA_totalRNAonly/fastq_pass/output2/Homo_sapiens.GRCh38.104.gtf.gene_pred done! == 17:55:24-Jul-01-2021 == [get_splice_site_from_bed12] Loading splice site from /mnt/home2/guans/circRNA/Nanopore/RCA_totalRNAonly/fastq_pass/output2/Homo_sapiens.GRCh38.104.gtf.bed ... == 17:55:37-Jul-01-2021 == [get_splice_site_from_bed12] Loading splice site from /mnt/home2/guans/circRNA/Nanopore/RCA_totalRNAonly/fastq_pass/output2/Homo_sapiens.GRCh38.104.gtf.bed done! == 17:55:37-Jul-01-2021 == [get_splice_junction_from_bed12] Loading splice junction from /mnt/home2/guans/circRNA/Nanopore/RCA_totalRNAonly/fastq_pass/output2/Homo_sapiens.GRCh38.104.gtf.bed ... == 17:55:46-Jul-01-2021 == [get_splice_junction_from_bed12] Loading splice junction from /mnt/home2/guans/circRNA/Nanopore/RCA_totalRNAonly/fastq_pass/output2/Homo_sapiens.GRCh38.104.gtf.bed done! == 17:55:46-Jul-01-2021 == [get_exon_from_bed12] Loading exon from /mnt/home2/guans/circRNA/Nanopore/RCA_totalRNAonly/fastq_pass/output2/Homo_sapiens.GRCh38.104.gtf.bed ... == 17:55:55-Jul-01-2021 == [get_exon_from_bed12] Loading exon from /mnt/home2/guans/circRNA/Nanopore/RCA_totalRNAonly/fastq_pass/output2/Homo_sapiens.GRCh38.104.gtf.bed done! == 17:55:55-Jul-01-2021 == [get_back_splice_junction_from_bed] Loading splice junction from /mnt/home2/guans/ref_genome/hg38/CircRNA_reference/human_circRNA_v2.0.bed ... Traceback (most recent call last): File "/home/guans/bin/anaconda3/bin/isocirc", line 219, in main() File "/home/guans/bin/anaconda3/bin/isocirc", line 216, in main isocirc_core(args) File "/home/guans/bin/anaconda3/bin/isocirc", line 135, in isocirc_core isoform_out, bed_out, stats_out) File "/home/guans/bin/anaconda3/lib/python3.7/site-packages/isocirc/hcBSJ_fullIso.py", line 782, in hcBSJ_fullIso circ_sj.append(pg.get_back_splice_junction_from_bed(circ_anno_bed, high_bam)) File "/home/guans/bin/anaconda3/lib/python3.7/site-packages/isocirc/parse_gff.py", line 291, in get_back_splice_junction_from_bed start = int(ele[bed_header['chromStart']]) ValueError: invalid literal for int() with base 10: 'Start'

braveagle0 commented 3 years ago

Please see the log above and help! Thanks a lot!

yangao07 commented 3 years ago

Based on the error message

ValueError: invalid literal for int() with base 10: 'Start'

Your bed file human_circRNA_v2.0.bed may have a header line and it should be removed.

braveagle0 commented 3 years ago

I removed the head and ran the isocirc again. Here is the log info "[M::mm_idx_gen::130.0861.12] collected minimizers [M::mm_idx_gen::140.7591.61] sorted minimizers [M::main::140.7631.61] loaded/built the index for 194 target sequence(s) [M::mm_mapopt_update::143.2841.60] mid_occ = 765 [M::mm_idx_stat] kmer size: 15; skip: 5; is_hpc: 0; #seq: 194 [M::mm_idx_stat::145.2681.59] distinct minimizers: 167225302 (35.46% are singletons); average occurrences: 6.030; average spacing: 3.074 [M::worker_pipeline::1236.4867.06] mapped 188232 sequences [M::main] Version: 2.17-r941 [M::main] CMD: minimap2 -ax splice -ub --MD --eqx -t 8 /mnt/home2/guans/ref_genome/hg38/CircRNA_reference/GRCh38/Homo_sapiens.GRCh38.dna.primary_assembly.fa output_no_head/cons.fa [M::main] Real time: 1236.718 sec; CPU: 8725.111 sec; Peak RSS: 22.318 GB [E::idx_find_and_load] Could not retrieve index file for 'output_no_head/high.bam' [E::idx_find_and_load] Could not retrieve index file for 'output_no_head/high.bam' [E::idx_find_and_load] Could not retrieve index file for 'output_no_head/high.bam' [E::idx_find_and_load] Could not retrieve index file for 'output_no_head/high.bam' [E::idx_find_and_load] Could not retrieve index file for 'output_no_head/high.bam' [E::idx_find_and_load] Could not retrieve index file for 'output_no_head/low.bam' == 10:51:32-Jul-12-2021 == [check_dependencies] Checking dependencies ... == 10:51:33-Jul-12-2021 == [check_dependencies] Checking dependencies done! == 10:51:33-Jul-12-2021 == [Tandem-Repeats-Finder] Finding tandem repeats with TRF ... == 10:51:33-Jul-12-2021 == [fxtools] fxtools sx TotalRNAonly.fa 8 /mnt/home2/guans/circRNA/Nanopore/RCA_totalRNAonly/fastq_pass/output_no_head/ == 10:52:03-Jul-12-2021 == [Tandem Repeats Finder] trf409.legacylinux64 /mnt/home2/guans/circRNA/Nanopore/RCA_totalRNAonly/fastq_pass/output_no_head/TotalRNAonly.fa.1 2 7 7 80 10 100 2000 -h -ngs > output_no_head/trf.out.1; rm /mnt/home2/guans/circRNA/Nanopore/RCA_totalRNAonly/fastq_pass/output_no_head/TotalRNAonly.fa.1 == 10:52:03-Jul-12-2021 == [Tandem Repeats Finder] trf409.legacylinux64 /mnt/home2/guans/circRNA/Nanopore/RCA_totalRNAonly/fastq_pass/output_no_head/TotalRNAonly.fa.2 2 7 7 80 10 100 2000 -h -ngs > output_no_head/trf.out.2; rm /mnt/home2/guans/circRNA/Nanopore/RCA_totalRNAonly/fastq_pass/output_no_head/TotalRNAonly.fa.2 == 10:52:03-Jul-12-2021 == [Tandem Repeats Finder] trf409.legacylinux64 /mnt/home2/guans/circRNA/Nanopore/RCA_totalRNAonly/fastq_pass/output_no_head/TotalRNAonly.fa.3 2 7 7 80 10 100 2000 -h -ngs > output_no_head/trf.out.3; rm /mnt/home2/guans/circRNA/Nanopore/RCA_totalRNAonly/fastq_pass/output_no_head/TotalRNAonly.fa.3 == 10:52:03-Jul-12-2021 == [Tandem Repeats Finder] trf409.legacylinux64 /mnt/home2/guans/circRNA/Nanopore/RCA_totalRNAonly/fastq_pass/output_no_head/TotalRNAonly.fa.4 2 7 7 80 10 100 2000 -h -ngs > output_no_head/trf.out.4; rm /mnt/home2/guans/circRNA/Nanopore/RCA_totalRNAonly/fastq_pass/output_no_head/TotalRNAonly.fa.4 == 10:52:03-Jul-12-2021 == [Tandem Repeats Finder] trf409.legacylinux64 /mnt/home2/guans/circRNA/Nanopore/RCA_totalRNAonly/fastq_pass/output_no_head/TotalRNAonly.fa.5 2 7 7 80 10 100 2000 -h -ngs > output_no_head/trf.out.5; rm /mnt/home2/guans/circRNA/Nanopore/RCA_totalRNAonly/fastq_pass/output_no_head/TotalRNAonly.fa.5 == 10:52:03-Jul-12-2021 == [Tandem Repeats Finder] trf409.legacylinux64 /mnt/home2/guans/circRNA/Nanopore/RCA_totalRNAonly/fastq_pass/output_no_head/TotalRNAonly.fa.6 2 7 7 80 10 100 2000 -h -ngs > output_no_head/trf.out.6; rm /mnt/home2/guans/circRNA/Nanopore/RCA_totalRNAonly/fastq_pass/output_no_head/TotalRNAonly.fa.6 == 10:52:03-Jul-12-2021 == [Tandem Repeats Finder] trf409.legacylinux64 /mnt/home2/guans/circRNA/Nanopore/RCA_totalRNAonly/fastq_pass/output_no_head/TotalRNAonly.fa.7 2 7 7 80 10 100 2000 -h -ngs > output_no_head/trf.out.7; rm /mnt/home2/guans/circRNA/Nanopore/RCA_totalRNAonly/fastq_pass/output_no_head/TotalRNAonly.fa.7 == 10:52:03-Jul-12-2021 == [Tandem Repeats Finder] trf409.legacylinux64 /mnt/home2/guans/circRNA/Nanopore/RCA_totalRNAonly/fastq_pass/output_no_head/TotalRNAonly.fa.8 2 7 7 80 10 100 2000 -h -ngs > output_no_head/trf.out.8; rm /mnt/home2/guans/circRNA/Nanopore/RCA_totalRNAonly/fastq_pass/output_no_head/TotalRNAonly.fa.8 == 11:17:41-Jul-12-2021 == [Tandem Repeats Finder] cat output_no_head/trf.out.1 >> output_no_head/trf.out; rm output_no_head/trf.out.1 == 11:17:46-Jul-12-2021 == [Tandem Repeats Finder] cat output_no_head/trf.out.2 >> output_no_head/trf.out; rm output_no_head/trf.out.2 == 11:17:50-Jul-12-2021 == [Tandem Repeats Finder] cat output_no_head/trf.out.3 >> output_no_head/trf.out; rm output_no_head/trf.out.3 == 11:17:51-Jul-12-2021 == [Tandem Repeats Finder] cat output_no_head/trf.out.4 >> output_no_head/trf.out; rm output_no_head/trf.out.4 == 11:17:53-Jul-12-2021 == [Tandem Repeats Finder] cat output_no_head/trf.out.5 >> output_no_head/trf.out; rm output_no_head/trf.out.5 == 11:17:56-Jul-12-2021 == [Tandem Repeats Finder] cat output_no_head/trf.out.6 >> output_no_head/trf.out; rm output_no_head/trf.out.6 == 11:18:00-Jul-12-2021 == [Tandem Repeats Finder] cat output_no_head/trf.out.7 >> output_no_head/trf.out; rm output_no_head/trf.out.7 == 11:18:02-Jul-12-2021 == [Tandem Repeats Finder] cat output_no_head/trf.out.8 >> output_no_head/trf.out; rm output_no_head/trf.out.8 == 11:18:04-Jul-12-2021 == [fxtools] fxtools lp TotalRNAonly.fa > /mnt/home2/guans/circRNA/Nanopore/RCA_totalRNAonly/fastq_pass/output_no_head/TotalRNAonly.fa.len 2> /dev/null == 11:22:30-Jul-12-2021 == [Tandem-Repeats-Finder] Finding tandem repeats with TRF done! == 11:22:30-Jul-12-2021 == [Mapping] Mapping consensus sequence to genome ... == 11:22:30-Jul-12-2021 == [Mapping] minimap2 -ax splice -ub --MD --eqx /mnt/home2/guans/ref_genome/hg38/CircRNA_reference/GRCh38/Homo_sapiens.GRCh38.dna.primary_assembly.fa output_no_head/cons.fa -t 8 > output_no_head/cons.fa.sam == 11:43:09-Jul-12-2021 == [Mapping] Mapping consensus sequence to genome done! == 11:43:09-Jul-12-2021 == [Classifying] Classifying consensus alignment ... == 11:43:09-Jul-12-2021 == [classify_bam_core] Processing output_no_head/cons.fa.sam ... == 11:43:30-Jul-12-2021 == [classify_bam_core] 100000 BAM records done ... == 11:43:51-Jul-12-2021 == [classify_bam_core] 200000 BAM records done ... == 11:44:12-Jul-12-2021 == [classify_bam_core] 300000 BAM records done ... == 11:44:27-Jul-12-2021 == [classify_bam_core] 400000 BAM records done ... == 11:44:44-Jul-12-2021 == [classify_bam_core] 500000 BAM records done ... == 11:44:51-Jul-12-2021 == [classify_bam_core] Processing output_no_head/cons.fa.sam done. == 11:44:51-Jul-12-2021 == [Classifying] Classifying consensus alignment done! == 11:45:03-Jul-12-2021 == [gtfToGenePred] gtfToGenePred -genePredExt -ignoreGroupsWithoutExons /mnt/home2/guans/ref_genome/hg38/CircRNA_reference/GRCh38/Homo_sapiens.GRCh38.104.gtf /mnt/home2/guans/circRNA/Nanopore/RCA_totalRNAonly/fastq_pass/output_no_head/Homo_sapiens.GRCh38.104.gtf.gene_pred == 11:46:12-Jul-12-2021 == [genePredToBed] genePredToBed /mnt/home2/guans/circRNA/Nanopore/RCA_totalRNAonly/fastq_pass/output_no_head/Homo_sapiens.GRCh38.104.gtf.gene_pred /mnt/home2/guans/circRNA/Nanopore/RCA_totalRNAonly/fastq_pass/output_no_head/Homo_sapiens.GRCh38.104.gtf.bed == 11:46:15-Jul-12-2021 == [get_transcript_from_bed12] Loading transcript from /mnt/home2/guans/circRNA/Nanopore/RCA_totalRNAonly/fastq_pass/output_no_head/Homo_sapiens.GRCh38.104.gtf.gene_pred ... == 11:46:28-Jul-12-2021 == [get_transcript_from_gene_pred] Loading transcript from /mnt/home2/guans/circRNA/Nanopore/RCA_totalRNAonly/fastq_pass/output_no_head/Homo_sapiens.GRCh38.104.gtf.gene_pred done! == 11:46:28-Jul-12-2021 == [get_splice_site_from_bed12] Loading splice site from /mnt/home2/guans/circRNA/Nanopore/RCA_totalRNAonly/fastq_pass/output_no_head/Homo_sapiens.GRCh38.104.gtf.bed ... == 11:46:42-Jul-12-2021 == [get_splice_site_from_bed12] Loading splice site from /mnt/home2/guans/circRNA/Nanopore/RCA_totalRNAonly/fastq_pass/output_no_head/Homo_sapiens.GRCh38.104.gtf.bed done! == 11:46:42-Jul-12-2021 == [get_splice_junction_from_bed12] Loading splice junction from /mnt/home2/guans/circRNA/Nanopore/RCA_totalRNAonly/fastq_pass/output_no_head/Homo_sapiens.GRCh38.104.gtf.bed ... == 11:46:53-Jul-12-2021 == [get_splice_junction_from_bed12] Loading splice junction from /mnt/home2/guans/circRNA/Nanopore/RCA_totalRNAonly/fastq_pass/output_no_head/Homo_sapiens.GRCh38.104.gtf.bed done! == 11:46:53-Jul-12-2021 == [get_exon_from_bed12] Loading exon from /mnt/home2/guans/circRNA/Nanopore/RCA_totalRNAonly/fastq_pass/output_no_head/Homo_sapiens.GRCh38.104.gtf.bed ... == 11:47:03-Jul-12-2021 == [get_exon_from_bed12] Loading exon from /mnt/home2/guans/circRNA/Nanopore/RCA_totalRNAonly/fastq_pass/output_no_head/Homo_sapiens.GRCh38.104.gtf.bed done! == 11:47:03-Jul-12-2021 == [get_back_splice_junction_from_bed] Loading splice junction from /mnt/home2/guans/ref_genome/hg38/CircRNA_reference/human_circRNA_v2.0.bed ... == 11:47:11-Jul-12-2021 == [get_back_splice_junction_from_bed] Loading splice junction from /mnt/home2/guans/ref_genome/hg38/CircRNA_reference/human_circRNA_v2.0.bed done! == 11:47:13-Jul-12-2021 == [read_wise_eval] Generating read-wise evaluation result ... == 11:58:03-Jul-12-2021 == [high_quality] 100000 high mapping quality BAM records have been processed ... == 12:03:09-Jul-12-2021 == [read_wise_eval] Generating read-wise evaluation result done! == 12:03:09-Jul-12-2021 == [filter_circRNA_read] Filtering back-splice-junctions ... == 12:03:13-Jul-12-2021 == [filter_circRNA_read] Filtering back-splice-junctions done! == 12:03:13-Jul-12-2021 == [rescue_reads] Rescuing reads using reliable back-splice-junctions ... == 12:03:19-Jul-12-2021 == [rescue_reads] Rescuing reads using reliable back-splice-junctions done! == 12:03:19-Jul-12-2021 == [uniq_isoform_with_unsorted_coors] Generating isoform-wise evaluation result ... == 12:03:19-Jul-12-2021 == [uniq_isoform_with_unsorted_coors] Generating isoform-wise evaluation result done! == 12:03:19-Jul-12-2021 == [bed2exonGtf] bed2exonGtf output_no_head/isocirc.bed output_no_head/isocirc.bed.exon.gtf == 12:03:20-Jul-12-2021 == [exonGtf] awk -v OFS="\t" '($3=="exon"){print}' /mnt/home2/guans/ref_genome/hg38/CircRNA_reference/GRCh38/Homo_sapiens.GRCh38.104.gtf > /mnt/home2/guans/circRNA/Nanopore/RCA_totalRNAonly/fastq_pass/output_no_head/Homo_sapiens.GRCh38.104.gtf.exon.gtf == 12:03:46-Jul-12-2021 == [gtf2bed] awk -v OFS="\t" '($3=="gene"){print $1,$4-1,$5}' /mnt/home2/guans/ref_genome/hg38/CircRNA_reference/GRCh38/Homo_sapiens.GRCh38.104.gtf > /mnt/home2/guans/circRNA/Nanopore/RCA_totalRNAonly/fastq_pass/output_no_head/Homo_sapiens.GRCh38.104.gtf.gene.bed == 12:04:04-Jul-12-2021 == [gtf2bed] awk -v OFS="\t" '($3=="CDS"){print}' /mnt/home2/guans/ref_genome/hg38/CircRNA_reference/GRCh38/Homo_sapiens.GRCh38.104.gtf > /mnt/home2/guans/circRNA/Nanopore/RCA_totalRNAonly/fastq_pass/output_no_head/Homo_sapiens.GRCh38.104.gtf.cds.gtf == 12:04:23-Jul-12-2021 == [gtf2bed] awk -v OFS="\t" '($3=="UTR" || $3=="five_prime_utr" || $3=="three_prime_utr"){print}' /mnt/home2/guans/ref_genome/hg38/CircRNA_reference/GRCh38/Homo_sapiens.GRCh38.104.gtf > /mnt/home2/guans/circRNA/Nanopore/RCA_totalRNAonly/fastq_pass/output_no_head/Homo_sapiens.GRCh38.104.gtf.utr.gtf == 12:04:40-Jul-12-2021 == [gtf2bed] awk -v OFS="\t" '($3=="exon" && ($0 ~ /gene_biotype "lincRNA"/ || $0 ~ /gene_type "lincRNA"/)){print}' /mnt/home2/guans/ref_genome/hg38/CircRNA_reference/GRCh38/Homo_sapiens.GRCh38.104.gtf > /mnt/home2/guans/circRNA/Nanopore/RCA_totalRNAonly/fastq_pass/output_no_head/Homo_sapiens.GRCh38.104.gtf.lincRNA.gtf == 12:05:10-Jul-12-2021 == [gtf2bed] awk -v OFS="\t" '($3=="exon" && ($0 ~ /gene_biotype "antisense"/ || $0 ~ /gene_type "antisense"/)){print}' /mnt/home2/guans/ref_genome/hg38/CircRNA_reference/GRCh38/Homo_sapiens.GRCh38.104.gtf > /mnt/home2/guans/circRNA/Nanopore/RCA_totalRNAonly/fastq_pass/output_no_head/Homo_sapiens.GRCh38.104.gtf.antisense.gtf == 12:05:34-Jul-12-2021 == [gtf2bed] awk -v OFS="\t" '($3=="exon" && ($0 ~ /gene_biotype "rRNA"/ || $0 ~ /gene_type "rRNA"/)){print}' /mnt/home2/guans/ref_genome/hg38/CircRNA_reference/GRCh38/Homo_sapiens.GRCh38.104.gtf > /mnt/home2/guans/circRNA/Nanopore/RCA_totalRNAonly/fastq_pass/output_no_head/Homo_sapiens.GRCh38.104.gtf.rRNA.gtf == 12:07:43-Jul-12-2021 == [bed2exonGtf] bed2exonGtf output_no_head/isocirc.bed.five.site.bed output_no_head/isocirc.bed.five.site.exon.gtf == 12:07:45-Jul-12-2021 == [bed2exonGtf] bed2exonGtf output_no_head/isocirc.bed.three.site.bed output_no_head/isocirc.bed.three.site.exon.gtf == 12:07:46-Jul-12-2021 == [bed2exonGtf] bed2exonGtf /mnt/home2/guans/circRNA/Nanopore/RCA_totalRNAonly/fastq_pass/output_no_head/Homo_sapiens.GRCh38.104.gtf.five.site.bed /mnt/home2/guans/circRNA/Nanopore/RCA_totalRNAonly/fastq_pass/output_no_head/Homo_sapiens.GRCh38.104.gtf.five.site.exon.gtf == 12:07:52-Jul-12-2021 == [bed2exonGtf] bed2exonGtf /mnt/home2/guans/circRNA/Nanopore/RCA_totalRNAonly/fastq_pass/output_no_head/Homo_sapiens.GRCh38.104.gtf.three.site.bed /mnt/home2/guans/circRNA/Nanopore/RCA_totalRNAonly/fastq_pass/output_no_head/Homo_sapiens.GRCh38.104.gtf.three.site.exon.gtf == 12:07:56-Jul-12-2021 == [itst_gtf_gtf] itst_gtf_gtf output_no_head/isocirc.bed.five.site.exon.gtf /mnt/home2/guans/circRNA/Nanopore/RCA_totalRNAonly/fastq_pass/output_no_head/Homo_sapiens.GRCh38.104.gtf.five.site.exon.gtf output_no_head/isocirc.bed.five.site.gene.out == 12:08:03-Jul-12-2021 == [itst_gtf_gtf] itst_gtf_gtf output_no_head/isocirc.bed.three.site.exon.gtf /mnt/home2/guans/circRNA/Nanopore/RCA_totalRNAonly/fastq_pass/output_no_head/Homo_sapiens.GRCh38.104.gtf.three.site.exon.gtf output_no_head/isocirc.bed.three.site.gene.out == 12:08:09-Jul-12-2021 == [gtf2gene] gtf2gene output_no_head/isocirc.bed.exon.gtf /mnt/home2/guans/ref_genome/hg38/CircRNA_reference/GRCh38/Homo_sapiens.GRCh38.104.gtf output_no_head/isocirc.bed.ovlp.gene.out Traceback (most recent call last): File "/home/guans/bin/anaconda3/bin/isocirc", line 219, in main() File "/home/guans/bin/anaconda3/bin/isocirc", line 216, in main isocirc_core(args) File "/home/guans/bin/anaconda3/bin/isocirc", line 135, in isocirc_core isoform_out, bed_out, stats_out) File "/home/guans/bin/anaconda3/lib/python3.7/site-packages/isocirc/hcBSJ_fullIso.py", line 826, in hcBSJ_fullIso itst_out_dict = intersect_with_bed(out_dir, circRNA_bed, all_anno, all_anno_bed, itst_anno_dict, flank_len, bedtools) File "/home/guans/bin/anaconda3/lib/python3.7/site-packages/isocirc/hcBSJ_fullIso.py", line 414, in intersect_with_bed get_ovlp_gene_name_id(ovlp_gene_name_id, gene_id_dict, gene_name_dict, gene_strand_dict) File "/home/guans/bin/anaconda3/lib/python3.7/site-packages/isocirc/hcBSJ_fullIso.py", line 214, in get_ovlp_gene_name_id strand_dict[ele[0]] = ele[3] if strand_dict[ele[0]] == 'NA' else strand_dict[ele[0]] + ',' + ele[3] IndexError: list index out of range " Would you please help?

Thanks!

yangao07 commented 3 years ago

Can you show me a few lines of output_no_head/isocirc.bed.ovlp.gene.out?

braveagle0 commented 3 years ago

Here is a few lines of the file: "isocirc0 ENSG00000230021 - isocirc10000 ENSG00000204390 HSPA1L - isocirc10001 ENSG00000204371 EHMT2 - isocirc10002 ENSG00000213676 ATF6B - isocirc10003 ENSG00000213676 ATF6B - isocirc10004 ENSG00000223501 VPS52 - isocirc10005 ENSG00000124493 GRM4 - isocirc10006 ENSG00000124493 GRM4 - isocirc10007 ENSG00000124493 GRM4 - isocirc10008 ENSG00000124493 GRM4 - isocirc10009 ENSG00000270800 RPS10-NUDT3 - isocirc10009 ENSG00000272325 NUDT3 - isocirc1000 ENSG00000143473 KCNH1 - isocirc1000 ENSG00000283952 - isocirc1000 ENSG00000284299 - isocirc10010 ENSG00000124507 PACSIN1 +"

yangao07 commented 3 years ago

Seems like some of the genes in your GTF file do not have a gene name. Can you type in grep ENSG00000230021 Homo_sapiens.GRCh38.104.gtf and paste the output here?

braveagle0 commented 3 years ago

1 havana transcript 720053 724564 . - . gene_id "ENSG00000230021"; gene_version "10"; transcript_id "ENST00000447954"; transcrip t_version "2"; gene_source "havana"; gene_biotype "transcribed_processed_pseudog ene"; transcript_source "havana"; transcript_biotype "processed_transcript"; tra nscript_support_level "2 (assigned to previous version 1)"; 1 havana exon 724358 724564 . - . gene_id "ENSG000 00230021"; gene_version "10"; transcript_id "ENST00000447954"; transcript_versio n "2"; exon_number "1"; gene_source "havana"; gene_biotype "transcribed_processe d_pseudogene"; transcript_source "havana"; transcript_biotype "processed_transcr ipt"; exon_id "ENSE00001688006"; exon_version "2"; transcript_support_level "2 ( assigned to previous version 1)"; 1 havana exon 720053 720200 . - . gene_id "ENSG000 00230021"; gene_version "10"; transcript_id "ENST00000447954"; transcript_versio n "2"; exon_number "2"; gene_source "havana"; gene_biotype "transcribed_processe d_pseudogene"; transcript_source "havana"; transcript_biotype "processed_transcr ipt"; exon_id "ENSE00001675630"; exon_version "2"; transcript_support_level "2 ( assigned to previous version 1)"; [guans@login-0-1 GRCh38]$ grep ENSG00000230021 Homo_sapiens.GRCh38.104.gtf 1 havana gene 586071 827796 . - . gene_id "ENSG00000230021"; gene_version "10"; gene_source "havana"; gene_biotype "transcribed_processed_pseudogene"; 1 havana transcript 586071 612813 . - . gene_id "ENSG00000230021"; gene_version "10"; transcript_id "ENST00000634833"; transcript_version "2"; gene_source "havana"; gene_biotype "transcribed_processed_pseudogene"; transcript_source "havana"; transcript_biotype "processed_transcript"; tag "basic"; transcript_support_level "5 (assigned to previous version 1)"; 1 havana exon 612741 612813 . - . gene_id "ENSG00000230021"; gene_version "10"; transcript_id "ENST00000634833"; transcript_version "2"; exon_number "1"; gene_source "havana"; gene_biotype "transcribed_processed_pseudogene"; transcript_source "havana"; transcript_biotype "processed_transcript"; exon_id "ENSE00003812707"; exon_version "1"; tag "basic"; transcript_support_level "5 (assigned to previous version 1)"; 1 havana exon 607955 608056 . - . gene_id "ENSG00000230021"; gene_version "10"; transcript_id "ENST00000634833"; transcript_version "2"; exon_number "2"; gene_source "havana"; gene_biotype "transcribed_processed_pseudogene"; transcript_source "havana"; transcript_biotype "processed_transcript"; exon_id "ENSE00001718533"; exon_version "1"; tag "basic"; transcript_support_level "5 (assigned to previous version 1)";

yangao07 commented 3 years ago

I see. Your GTF file has no "gene_name" tags, this is why isoCirc met an error.

I just updated the related script. You can try the latest version of isoCirc (v1.0.4), it should work now.

braveagle0 commented 3 years ago

I tried v1.0.4 and still encounter some errors. "== 12:24:05-Jul-15-2021 == [read_wise_eval] Generating read-wise evaluation result ... == 12:37:01-Jul-15-2021 == [high_quality] 100000 high mapping quality BAM records have been processed ... == 12:43:43-Jul-15-2021 == [read_wise_eval] Generating read-wise evaluation result done! == 12:43:43-Jul-15-2021 == [filter_circRNA_read] Filtering back-splice-junctions ... == 12:43:47-Jul-15-2021 == [filter_circRNA_read] Filtering back-splice-junctions done! == 12:43:47-Jul-15-2021 == [rescue_reads] Rescuing reads using reliable back-splice-junctions ... == 12:43:52-Jul-15-2021 == [rescue_reads] Rescuing reads using reliable back-splice-junctions done! == 12:43:52-Jul-15-2021 == [uniq_isoform_with_unsorted_coors] Generating isoform-wise evaluation result ... == 12:43:53-Jul-15-2021 == [uniq_isoform_with_unsorted_coors] Generating isoform-wise evaluation result done! == 12:43:53-Jul-15-2021 == [bed2exonGtf] bed2exonGtf output_104/isocirc.bed output_104/isocirc.bed.exon.gtf == 12:43:56-Jul-15-2021 == [exonGtf] awk -v OFS="\t" '($3=="exon"){print}' /mnt/home2/guans/ref_genome/hg38/CircRNA_reference/GRCh38/Homo_sapiens.GRCh38.104.gtf > /mnt/home2/guans/circRNA/Nanopore/RCA_totalRNAonly/fastq_pass/output_104/Homo_sapiens.GRCh38.104.gtf.exon.gtf == 12:44:39-Jul-15-2021 == [gtf2bed] awk -v OFS="\t" '($3=="gene"){print $1,$4-1,$5}' /mnt/home2/guans/ref_genome/hg38/CircRNA_reference/GRCh38/Homo_sapiens.GRCh38.104.gtf > /mnt/home2/guans/circRNA/Nanopore/RCA_totalRNAonly/fastq_pass/output_104/Homo_sapiens.GRCh38.104.gtf.gene.bed == 12:45:26-Jul-15-2021 == [gtf2bed] awk -v OFS="\t" '($3=="CDS"){print}' /mnt/home2/guans/ref_genome/hg38/CircRNA_reference/GRCh38/Homo_sapiens.GRCh38.104.gtf > /mnt/home2/guans/circRNA/Nanopore/RCA_totalRNAonly/fastq_pass/output_104/Homo_sapiens.GRCh38.104.gtf.cds.gtf == 12:46:09-Jul-15-2021 == [gtf2bed] awk -v OFS="\t" '($3=="UTR" || $3=="five_prime_utr" || $3=="three_prime_utr"){print}' /mnt/home2/guans/ref_genome/hg38/CircRNA_reference/GRCh38/Homo_sapiens.GRCh38.104.gtf > /mnt/home2/guans/circRNA/Nanopore/RCA_totalRNAonly/fastq_pass/output_104/Homo_sapiens.GRCh38.104.gtf.utr.gtf == 12:46:53-Jul-15-2021 == [gtf2bed] awk -v OFS="\t" '($3=="exon" && ($0 ~ /gene_biotype "lincRNA"/ || $0 ~ /gene_type "lincRNA"/)){print}' /mnt/home2/guans/ref_genome/hg38/CircRNA_reference/GRCh38/Homo_sapiens.GRCh38.104.gtf > /mnt/home2/guans/circRNA/Nanopore/RCA_totalRNAonly/fastq_pass/output_104/Homo_sapiens.GRCh38.104.gtf.lincRNA.gtf == 12:47:42-Jul-15-2021 == [gtf2bed] awk -v OFS="\t" '($3=="exon" && ($0 ~ /gene_biotype "antisense"/ || $0 ~ /gene_type "antisense"/)){print}' /mnt/home2/guans/ref_genome/hg38/CircRNA_reference/GRCh38/Homo_sapiens.GRCh38.104.gtf > /mnt/home2/guans/circRNA/Nanopore/RCA_totalRNAonly/fastq_pass/output_104/Homo_sapiens.GRCh38.104.gtf.antisense.gtf == 12:48:30-Jul-15-2021 == [gtf2bed] awk -v OFS="\t" '($3=="exon" && ($0 ~ /gene_biotype "rRNA"/ || $0 ~ /gene_type "rRNA"/)){print}' /mnt/home2/guans/ref_genome/hg38/CircRNA_reference/GRCh38/Homo_sapiens.GRCh38.104.gtf > /mnt/home2/guans/circRNA/Nanopore/RCA_totalRNAonly/fastq_pass/output_104/Homo_sapiens.GRCh38.104.gtf.rRNA.gtf == 12:55:15-Jul-15-2021 == [bed2exonGtf] bed2exonGtf output_104/isocirc.bed.five.site.bed output_104/isocirc.bed.five.site.exon.gtf == 12:55:19-Jul-15-2021 == [bed2exonGtf] bed2exonGtf output_104/isocirc.bed.three.site.bed output_104/isocirc.bed.three.site.exon.gtf == 12:55:22-Jul-15-2021 == [bed2exonGtf] bed2exonGtf /mnt/home2/guans/circRNA/Nanopore/RCA_totalRNAonly/fastq_pass/output_104/Homo_sapiens.GRCh38.104.gtf.five.site.bed /mnt/home2/guans/circRNA/Nanopore/RCA_totalRNAonly/fastq_pass/output_104/Homo_sapiens.GRCh38.104.gtf.five.site.exon.gtf == 12:55:35-Jul-15-2021 == [bed2exonGtf] bed2exonGtf /mnt/home2/guans/circRNA/Nanopore/RCA_totalRNAonly/fastq_pass/output_104/Homo_sapiens.GRCh38.104.gtf.three.site.bed /mnt/home2/guans/circRNA/Nanopore/RCA_totalRNAonly/fastq_pass/output_104/Homo_sapiens.GRCh38.104.gtf.three.site.exon.gtf == 12:55:46-Jul-15-2021 == [itst_gtf_gtf] itst_gtf_gtf output_104/isocirc.bed.five.site.exon.gtf /mnt/home2/guans/circRNA/Nanopore/RCA_totalRNAonly/fastq_pass/output_104/Homo_sapiens.GRCh38.104.gtf.five.site.exon.gtf output_104/isocirc.bed.five.site.gene.out == 12:56:12-Jul-15-2021 == [itst_gtf_gtf] itst_gtf_gtf output_104/isocirc.bed.three.site.exon.gtf /mnt/home2/guans/circRNA/Nanopore/RCA_totalRNAonly/fastq_pass/output_104/Homo_sapiens.GRCh38.104.gtf.three.site.exon.gtf output_104/isocirc.bed.three.site.gene.out == 12:56:38-Jul-15-2021 == [gtf2gene] gtf2gene output_104/isocirc.bed.exon.gtf /mnt/home2/guans/ref_genome/hg38/CircRNA_reference/GRCh38/Homo_sapiens.GRCh38.104.gtf output_104/isocirc.bed.ovlp.gene.out Traceback (most recent call last): File "/home/guans/bin/anaconda3/bin/isocirc", line 219, in main() File "/home/guans/bin/anaconda3/bin/isocirc", line 216, in main isocirc_core(args) File "/home/guans/bin/anaconda3/bin/isocirc", line 135, in isocirc_core isoform_out, bed_out, stats_out) File "/home/guans/bin/anaconda3/lib/python3.7/site-packages/isocirc/hcBSJ_fullIso.py", line 826, in hcBSJ_fullIso itst_out_dict = intersect_with_bed(out_dir, circRNA_bed, all_anno, all_anno_bed, itst_anno_dict, flank_len, bedtools) File "/home/guans/bin/anaconda3/lib/python3.7/site-packages/isocirc/hcBSJ_fullIso.py", line 414, in intersect_with_bed get_ovlp_gene_name_id(ovlp_gene_name_id, gene_id_dict, gene_name_dict, gene_strand_dict) File "/home/guans/bin/anaconda3/lib/python3.7/site-packages/isocirc/hcBSJ_fullIso.py", line 214, in get_ovlp_gene_name_id strand_dict[ele[0]] = ele[3] if strand_dict[ele[0]] == 'NA' else strand_dict[ele[0]] + ',' + ele[3] IndexError: list index out of range "

braveagle0 commented 3 years ago

Do you mind sharing with me where you downloaded your .fa, .gtf and .bed file? Thanks!