Xinglab / isoCirc

isoCirc
GNU General Public License v3.0
10 stars 4 forks source link

can't get anno block #10

Open sidizhao opened 1 year ago

sidizhao commented 1 year ago

Hi there,

I've been using isoCirc successfully for a while now but this week after installing v1.0.6, it seems to be generating some errors towards the end of the process. I am able to get isocirc.bed output but not isocirc.out. Here's the full error:

Matplotlib created a temporary config/cache directory at /tmp/977627.tmpdir/matplotlib-z2gis1si because the default path (/home/s.zhao/.config/matplotlib) is not a writable directory; it is highly recommended to set the MPLCONFIGDIR environment variable to a writable directory, in particular to speed up the import of Matplotlib and to better support multiprocessing. == 06:19:32-Feb-09-2023 == [check_dependencies] Checking dependencies ... == 06:19:32-Feb-09-2023 == [check_dependencies] Checking dependencies done! == 06:19:33-Feb-09-2023 == [Tandem-Repeats-Finder] Finding tandem repeats with TRF ... == 06:19:33-Feb-09-2023 == [Tandem Repeats Finder] trf409.legacylinux64 /storage1/fs1/christophermaher/Active/maherlab/sidizhao/circ_rna/long_read/isocirc_rerun/crc_matched_patients/722_primary_pacbio/722_primary_pacbio_long_corrected.0.fa 2 7 7 80 10 100 2000 -h -ngs > /storage1/fs1/christophermaher/Active/maherlab/sidizhao/circ_rna/long_read/isocirc_rerun/crc_matched_patients/722_primary_pacbio/isocirc_output_short_read/output_0/trf.out == 14:35:27-Feb-09-2023 == [Tandem-Repeats-Finder] Finding tandem repeats with TRF done! == 14:35:27-Feb-09-2023 == [Mapping] Mapping consensus sequence to genome ... == 14:35:27-Feb-09-2023 == [Mapping] minimap2 -ax splice -ub --MD --eqx /storage1/fs1/christophermaher/Active/maherlab/sidizhao/circ_rna/long_read/hct116_pacbio/annotation/all-chrs.fa /storage1/fs1/christophermaher/Active/maherlab/sidizhao/circ_rna/long_read/isocirc_rerun/crc_matched_patients/722_primary_pacbio/isocirc_output_short_read/output_0/cons.fa -t 1 > /storage1/fs1/christophermaher/Active/maherlab/sidizhao/circ_rna/long_read/isocirc_rerun/crc_matched_patients/722_primary_pacbio/isocirc_output_short_read/output_0/cons.fa.sam [M::mm_idx_gen::135.0830.90] collected minimizers [M::mm_idx_gen::238.3200.89] sorted minimizers [M::main::238.4270.89] loaded/built the index for 455 target sequence(s) [M::mm_mapopt_update::243.5150.89] mid_occ = 792 [M::mm_idx_stat] kmer size: 15; skip: 5; is_hpc: 0; #seq: 455 [M::mm_idx_stat::245.9300.89] distinct minimizers: 167291034 (34.68% are singletons); average occurrences: 6.239; average spacing: 3.075 [M::worker_pipeline::1077.0800.85] mapped 94741 sequences [M::main] Version: 2.17-r941 [M::main] CMD: minimap2 -ax splice -ub --MD --eqx -t 1 /storage1/fs1/christophermaher/Active/maherlab/sidizhao/circ_rna/long_read/hct116_pacbio/annotation/all-chrs.fa /storage1/fs1/christophermaher/Active/maherlab/sidizhao/circ_rna/long_read/isocirc_rerun/crc_matched_patients/722_primary_pacbio/isocirc_output_short_read/output_0/cons.fa [M::main] Real time: 1079.441 sec; CPU: 913.986 sec; Peak RSS: 18.981 GB == 14:53:27-Feb-09-2023 == [Mapping] Mapping consensus sequence to genome done! == 14:53:27-Feb-09-2023 == [Classifying] Classifying consensus alignment ... == 14:53:27-Feb-09-2023 == [classify_bam_core] Processing /storage1/fs1/christophermaher/Active/maherlab/sidizhao/circ_rna/long_read/isocirc_rerun/crc_matched_patients/722_primary_pacbio/isocirc_output_short_read/output_0/cons.fa.sam ... == 14:53:31-Feb-09-2023 == [classify_bam_core] 100000 BAM records done ... == 14:53:33-Feb-09-2023 == [classify_bam_core] Processing /storage1/fs1/christophermaher/Active/maherlab/sidizhao/circ_rna/long_read/isocirc_rerun/crc_matched_patients/722_primary_pacbio/isocirc_output_short_read/output_0/cons.fa.sam done. == 14:53:33-Feb-09-2023 == [Classifying] Classifying consensus alignment done! == 14:53:35-Feb-09-2023 == [gtfToGenePred] gtfToGenePred -ignoreGroupsWithoutExons /storage1/fs1/christophermaher/Active/maherlab/sidizhao/circ_rna/long_read/isocirc_rerun/annotation/hg38_with_maher_lab_lncrna.gtf /storage1/fs1/christophermaher/Active/maherlab/sidizhao/circ_rna/long_read/isocirc_rerun/crc_matched_patients/722_primary_pacbio/isocirc_output_short_read/output_0/hg38_with_maher_lab_lncrna.gtf.gene_pred == 14:55:19-Feb-09-2023 == [genePredToBed] genePredToBed /storage1/fs1/christophermaher/Active/maherlab/sidizhao/circ_rna/long_read/isocirc_rerun/crc_matched_patients/722_primary_pacbio/isocirc_output_short_read/output_0/hg38_with_maher_lab_lncrna.gtf.gene_pred /storage1/fs1/christophermaher/Active/maherlab/sidizhao/circ_rna/long_read/isocirc_rerun/crc_matched_patients/722_primary_pacbio/isocirc_output_short_read/output_0/hg38_with_maher_lab_lncrna.gtf.bed == 14:56:10-Feb-09-2023 == [get_transcript_from_gtf] Loading transcript from /storage1/fs1/christophermaher/Active/maherlab/sidizhao/circ_rna/long_read/isocirc_rerun/annotation/hg38_with_maher_lab_lncrna.gtf ... == 14:56:31-Feb-09-2023 == [get_transcript_from_gtf] Loading transcript from /storage1/fs1/christophermaher/Active/maherlab/sidizhao/circ_rna/long_read/isocirc_rerun/annotation/hg38_with_maher_lab_lncrna.gtf done! == 14:56:31-Feb-09-2023 == [get_splice_site_from_bed12] Loading splice site from /storage1/fs1/christophermaher/Active/maherlab/sidizhao/circ_rna/long_read/isocirc_rerun/crc_matched_patients/722_primary_pacbio/isocirc_output_short_read/output_0/hg38_with_maher_lab_lncrna.gtf.bed ... [E::idx_find_and_load] Could not retrieve index file for '/storage1/fs1/christophermaher/Active/maherlab/sidizhao/circ_rna/long_read/isocirc_rerun/crc_matched_patients/722_primary_pacbio/isocirc_output_short_read/output_0/high.bam' == 14:56:41-Feb-09-2023 == [get_splice_site_from_bed12] Loading splice site from /storage1/fs1/christophermaher/Active/maherlab/sidizhao/circ_rna/long_read/isocirc_rerun/crc_matched_patients/722_primary_pacbio/isocirc_output_short_read/output_0/hg38_with_maher_lab_lncrna.gtf.bed done! == 14:56:41-Feb-09-2023 == [get_splice_junction_from_bed12] Loading splice junction from /storage1/fs1/christophermaher/Active/maherlab/sidizhao/circ_rna/long_read/isocirc_rerun/crc_matched_patients/722_primary_pacbio/isocirc_output_short_read/output_0/hg38_with_maher_lab_lncrna.gtf.bed ... [E::idx_find_and_load] Could not retrieve index file for '/storage1/fs1/christophermaher/Active/maherlab/sidizhao/circ_rna/long_read/isocirc_rerun/crc_matched_patients/722_primary_pacbio/isocirc_output_short_read/output_0/high.bam' == 14:56:49-Feb-09-2023 == [get_splice_junction_from_bed12] Loading splice junction from /storage1/fs1/christophermaher/Active/maherlab/sidizhao/circ_rna/long_read/isocirc_rerun/crc_matched_patients/722_primary_pacbio/isocirc_output_short_read/output_0/hg38_with_maher_lab_lncrna.gtf.bed done! == 14:56:49-Feb-09-2023 == [get_exon_from_bed12] Loading exon from /storage1/fs1/christophermaher/Active/maherlab/sidizhao/circ_rna/long_read/isocirc_rerun/crc_matched_patients/722_primary_pacbio/isocirc_output_short_read/output_0/hg38_with_maher_lab_lncrna.gtf.bed ... [E::idx_find_and_load] Could not retrieve index file for '/storage1/fs1/christophermaher/Active/maherlab/sidizhao/circ_rna/long_read/isocirc_rerun/crc_matched_patients/722_primary_pacbio/isocirc_output_short_read/output_0/high.bam' == 14:56:57-Feb-09-2023 == [get_exon_from_bed12] Loading exon from /storage1/fs1/christophermaher/Active/maherlab/sidizhao/circ_rna/long_read/isocirc_rerun/crc_matched_patients/722_primary_pacbio/isocirc_output_short_read/output_0/hg38_with_maher_lab_lncrna.gtf.bed done! == 14:56:57-Feb-09-2023 == [get_back_splice_junction_from_bed] Loading splice junction from /storage1/fs1/christophermaher/Active/maherlab/sidizhao/circ_rna/long_read/isocirc_rerun/annotation/patient_722_short_read_annotation.bed ... [E::idx_find_and_load] Could not retrieve index file for '/storage1/fs1/christophermaher/Active/maherlab/sidizhao/circ_rna/long_read/isocirc_rerun/crc_matched_patients/722_primary_pacbio/isocirc_output_short_read/output_0/high.bam' == 14:56:57-Feb-09-2023 == [get_back_splice_junction_from_bed] Loading splice junction from /storage1/fs1/christophermaher/Active/maherlab/sidizhao/circ_rna/long_read/isocirc_rerun/annotation/patient_722_short_read_annotation.bed done! [E::idx_find_and_load] Could not retrieve index file for '/storage1/fs1/christophermaher/Active/maherlab/sidizhao/circ_rna/long_read/isocirc_rerun/crc_matched_patients/722_primary_pacbio/isocirc_output_short_read/output_0/high.bam' [E::idx_find_and_load] Could not retrieve index file for '/storage1/fs1/christophermaher/Active/maherlab/sidizhao/circ_rna/long_read/isocirc_rerun/crc_matched_patients/722_primary_pacbio/isocirc_output_short_read/output_0/low.bam' == 14:56:58-Feb-09-2023 == [read_wise_eval] Generating read-wise evaluation result ... == 14:58:02-Feb-09-2023 == [read_wise_eval] Generating read-wise evaluation result done! == 14:58:02-Feb-09-2023 == [filter_circRNA_read] Filtering back-splice-junctions ... == 14:58:03-Feb-09-2023 == [filter_circRNA_read] Filtering back-splice-junctions done! == 14:58:03-Feb-09-2023 == [rescue_reads] Rescuing reads using reliable back-splice-junctions ... == 14:58:03-Feb-09-2023 == [rescue_reads] Rescuing reads using reliable back-splice-junctions done! == 14:58:03-Feb-09-2023 == [uniq_isoform_with_unsorted_coors] Generating isoform-wise evaluation result ... == 14:58:03-Feb-09-2023 == [uniq_isoform_with_unsorted_coors] Generating isoform-wise evaluation result done! == 14:58:13-Feb-09-2023 == [bed2exonGtf] bed2exonGtf /storage1/fs1/christophermaher/Active/maherlab/sidizhao/circ_rna/long_read/isocirc_rerun/crc_matched_patients/722_primary_pacbio/isocirc_output_short_read/output_0/isocirc.bed /storage1/fs1/christophermaher/Active/maherlab/sidizhao/circ_rna/long_read/isocirc_rerun/crc_matched_patients/722_primary_pacbio/isocirc_output_short_read/output_0/isocirc.bed.exon.gtf == 14:58:18-Feb-09-2023 == [exonGtf] awk -v OFS="\t" '($3=="exon"){print}' /storage1/fs1/christophermaher/Active/maherlab/sidizhao/circ_rna/long_read/isocirc_rerun/annotation/hg38_with_maher_lab_lncrna.gtf > /storage1/fs1/christophermaher/Active/maherlab/sidizhao/circ_rna/long_read/isocirc_rerun/crc_matched_patients/722_primary_pacbio/isocirc_output_short_read/output_0/hg38_with_maher_lab_lncrna.gtf.exon.gtf == 14:58:29-Feb-09-2023 == [gtf2bed] awk -v OFS="\t" '($3=="gene"){print $1,$4-1,$5}' /storage1/fs1/christophermaher/Active/maherlab/sidizhao/circ_rna/long_read/isocirc_rerun/annotation/hg38_with_maher_lab_lncrna.gtf > /storage1/fs1/christophermaher/Active/maherlab/sidizhao/circ_rna/long_read/isocirc_rerun/crc_matched_patients/722_primary_pacbio/isocirc_output_short_read/output_0/hg38_with_maher_lab_lncrna.gtf.gene.bed == 14:58:43-Feb-09-2023 == [gtf2bed] awk -v OFS="\t" '($3=="CDS"){print}' /storage1/fs1/christophermaher/Active/maherlab/sidizhao/circ_rna/long_read/isocirc_rerun/annotation/hg38_with_maher_lab_lncrna.gtf > /storage1/fs1/christophermaher/Active/maherlab/sidizhao/circ_rna/long_read/isocirc_rerun/crc_matched_patients/722_primary_pacbio/isocirc_output_short_read/output_0/hg38_with_maher_lab_lncrna.gtf.cds.gtf == 14:58:58-Feb-09-2023 == [gtf2bed] awk -v OFS="\t" '($3=="UTR" || $3=="five_prime_utr" || $3=="three_prime_utr"){print}' /storage1/fs1/christophermaher/Active/maherlab/sidizhao/circ_rna/long_read/isocirc_rerun/annotation/hg38_with_maher_lab_lncrna.gtf > /storage1/fs1/christophermaher/Active/maherlab/sidizhao/circ_rna/long_read/isocirc_rerun/crc_matched_patients/722_primary_pacbio/isocirc_output_short_read/output_0/hg38_with_maher_lab_lncrna.gtf.utr.gtf == 14:59:11-Feb-09-2023 == [gtf2bed] awk -v OFS="\t" '($3=="exon" && ($0 ~ /gene_biotype "lincRNA"/ || $0 ~ /gene_type "lincRNA"/)){print}' /storage1/fs1/christophermaher/Active/maherlab/sidizhao/circ_rna/long_read/isocirc_rerun/annotation/hg38_with_maher_lab_lncrna.gtf > /storage1/fs1/christophermaher/Active/maherlab/sidizhao/circ_rna/long_read/isocirc_rerun/crc_matched_patients/722_primary_pacbio/isocirc_output_short_read/output_0/hg38_with_maher_lab_lncrna.gtf.lincRNA.gtf == 14:59:29-Feb-09-2023 == [gtf2bed] awk -v OFS="\t" '($3=="exon" && ($0 ~ /gene_biotype "antisense"/ || $0 ~ /gene_type "antisense"/)){print}' /storage1/fs1/christophermaher/Active/maherlab/sidizhao/circ_rna/long_read/isocirc_rerun/annotation/hg38_with_maher_lab_lncrna.gtf > /storage1/fs1/christophermaher/Active/maherlab/sidizhao/circ_rna/long_read/isocirc_rerun/crc_matched_patients/722_primary_pacbio/isocirc_output_short_read/output_0/hg38_with_maher_lab_lncrna.gtf.antisense.gtf == 14:59:51-Feb-09-2023 == [gtf2bed] awk -v OFS="\t" '($3=="exon" && ($0 ~ /gene_biotype "rRNA"/ || $0 ~ /gene_type "rRNA"/)){print}' /storage1/fs1/christophermaher/Active/maherlab/sidizhao/circ_rna/long_read/isocirc_rerun/annotation/hg38_with_maher_lab_lncrna.gtf > /storage1/fs1/christophermaher/Active/maherlab/sidizhao/circ_rna/long_read/isocirc_rerun/crc_matched_patients/722_primary_pacbio/isocirc_output_short_read/output_0/hg38_with_maher_lab_lncrna.gtf.rRNA.gtf == 15:01:14-Feb-09-2023 == [bed2exonGtf] bed2exonGtf /storage1/fs1/christophermaher/Active/maherlab/sidizhao/circ_rna/long_read/isocirc_rerun/crc_matched_patients/722_primary_pacbio/isocirc_output_short_read/output_0/isocirc.bed.five.site.bed /storage1/fs1/christophermaher/Active/maherlab/sidizhao/circ_rna/long_read/isocirc_rerun/crc_matched_patients/722_primary_pacbio/isocirc_output_short_read/output_0/isocirc.bed.five.site.exon.gtf == 15:01:23-Feb-09-2023 == [bed2exonGtf] bed2exonGtf /storage1/fs1/christophermaher/Active/maherlab/sidizhao/circ_rna/long_read/isocirc_rerun/crc_matched_patients/722_primary_pacbio/isocirc_output_short_read/output_0/isocirc.bed.three.site.bed /storage1/fs1/christophermaher/Active/maherlab/sidizhao/circ_rna/long_read/isocirc_rerun/crc_matched_patients/722_primary_pacbio/isocirc_output_short_read/output_0/isocirc.bed.three.site.exon.gtf == 15:01:26-Feb-09-2023 == [bed2exonGtf] bed2exonGtf /storage1/fs1/christophermaher/Active/maherlab/sidizhao/circ_rna/long_read/isocirc_rerun/crc_matched_patients/722_primary_pacbio/isocirc_output_short_read/output_0/hg38_with_maher_lab_lncrna.gtf.five.site.bed /storage1/fs1/christophermaher/Active/maherlab/sidizhao/circ_rna/long_read/isocirc_rerun/crc_matched_patients/722_primary_pacbio/isocirc_output_short_read/output_0/hg38_with_maher_lab_lncrna.gtf.five.site.exon.gtf == 15:01:47-Feb-09-2023 == [bed2exonGtf] bed2exonGtf /storage1/fs1/christophermaher/Active/maherlab/sidizhao/circ_rna/long_read/isocirc_rerun/crc_matched_patients/722_primary_pacbio/isocirc_output_short_read/output_0/hg38_with_maher_lab_lncrna.gtf.three.site.bed /storage1/fs1/christophermaher/Active/maherlab/sidizhao/circ_rna/long_read/isocirc_rerun/crc_matched_patients/722_primary_pacbio/isocirc_output_short_read/output_0/hg38_with_maher_lab_lncrna.gtf.three.site.exon.gtf == 15:02:13-Feb-09-2023 == [itst_gtf_gtf] itst_gtf_gtf /storage1/fs1/christophermaher/Active/maherlab/sidizhao/circ_rna/long_read/isocirc_rerun/crc_matched_patients/722_primary_pacbio/isocirc_output_short_read/output_0/isocirc.bed.five.site.exon.gtf /storage1/fs1/christophermaher/Active/maherlab/sidizhao/circ_rna/long_read/isocirc_rerun/crc_matched_patients/722_primary_pacbio/isocirc_output_short_read/output_0/hg38_with_maher_lab_lncrna.gtf.five.site.exon.gtf /storage1/fs1/christophermaher/Active/maherlab/sidizhao/circ_rna/long_read/isocirc_rerun/crc_matched_patients/722_primary_pacbio/isocirc_output_short_read/output_0/isocirc.bed.five.site.gene.out == 15:02:18-Feb-09-2023 == [itst_gtf_gtf] itst_gtf_gtf /storage1/fs1/christophermaher/Active/maherlab/sidizhao/circ_rna/long_read/isocirc_rerun/crc_matched_patients/722_primary_pacbio/isocirc_output_short_read/output_0/isocirc.bed.three.site.exon.gtf /storage1/fs1/christophermaher/Active/maherlab/sidizhao/circ_rna/long_read/isocirc_rerun/crc_matched_patients/722_primary_pacbio/isocirc_output_short_read/output_0/hg38_with_maher_lab_lncrna.gtf.three.site.exon.gtf /storage1/fs1/christophermaher/Active/maherlab/sidizhao/circ_rna/long_read/isocirc_rerun/crc_matched_patients/722_primary_pacbio/isocirc_output_short_read/output_0/isocirc.bed.three.site.gene.out == 15:02:22-Feb-09-2023 == [gtf2gene] gtf2gene /storage1/fs1/christophermaher/Active/maherlab/sidizhao/circ_rna/long_read/isocirc_rerun/crc_matched_patients/722_primary_pacbio/isocirc_output_short_read/output_0/isocirc.bed.exon.gtf /storage1/fs1/christophermaher/Active/maherlab/sidizhao/circ_rna/long_read/isocirc_rerun/annotation/hg38_with_maher_lab_lncrna.gtf /storage1/fs1/christophermaher/Active/maherlab/sidizhao/circ_rna/long_read/isocirc_rerun/crc_matched_patients/722_primary_pacbio/isocirc_output_short_read/output_0/isocirc.bed.ovlp.gene.out == 15:02:46-Feb-09-2023 == [itst_gtf_bed] itst_gtf_bed /storage1/fs1/christophermaher/Active/maherlab/sidizhao/circ_rna/long_read/isocirc_rerun/crc_matched_patients/722_primary_pacbio/isocirc_output_short_read/output_0/isocirc.bed.exon.gtf /storage1/fs1/christophermaher/Active/maherlab/sidizhao/circ_rna/long_read/isocirc_rerun/crc_matched_patients/722_primary_pacbio/isocirc_output_short_read/output_0/hg38_with_maher_lab_lncrna.gtf.cds.gtf /storage1/fs1/christophermaher/Active/maherlab/sidizhao/circ_rna/long_read/isocirc_rerun/crc_matched_patients/722_primary_pacbio/isocirc_output_short_read/output_0/isocirc.bed.CDS.out == 15:02:52-Feb-09-2023 == [itst_gtf_bed] itst_gtf_bed /storage1/fs1/christophermaher/Active/maherlab/sidizhao/circ_rna/long_read/isocirc_rerun/crc_matched_patients/722_primary_pacbio/isocirc_output_short_read/output_0/isocirc.bed.exon.gtf /storage1/fs1/christophermaher/Active/maherlab/sidizhao/circ_rna/long_read/isocirc_rerun/crc_matched_patients/722_primary_pacbio/isocirc_output_short_read/output_0/hg38_with_maher_lab_lncrna.gtf.utr.gtf /storage1/fs1/christophermaher/Active/maherlab/sidizhao/circ_rna/long_read/isocirc_rerun/crc_matched_patients/722_primary_pacbio/isocirc_output_short_read/output_0/isocirc.bed.UTR.out == 15:02:53-Feb-09-2023 == [itst_gtf_bed] itst_gtf_bed /storage1/fs1/christophermaher/Active/maherlab/sidizhao/circ_rna/long_read/isocirc_rerun/crc_matched_patients/722_primary_pacbio/isocirc_output_short_read/output_0/isocirc.bed.exon.gtf /storage1/fs1/christophermaher/Active/maherlab/sidizhao/circ_rna/long_read/isocirc_rerun/crc_matched_patients/722_primary_pacbio/isocirc_output_short_read/output_0/hg38_with_maher_lab_lncrna.gtf.lincRNA.gtf /storage1/fs1/christophermaher/Active/maherlab/sidizhao/circ_rna/long_read/isocirc_rerun/crc_matched_patients/722_primary_pacbio/isocirc_output_short_read/output_0/isocirc.bed.lincRNA.out == 15:02:54-Feb-09-2023 == [itst_gtf_bed] itst_gtf_bed /storage1/fs1/christophermaher/Active/maherlab/sidizhao/circ_rna/long_read/isocirc_rerun/crc_matched_patients/722_primary_pacbio/isocirc_output_short_read/output_0/isocirc.bed.exon.gtf /storage1/fs1/christophermaher/Active/maherlab/sidizhao/circ_rna/long_read/isocirc_rerun/crc_matched_patients/722_primary_pacbio/isocirc_output_short_read/output_0/hg38_with_maher_lab_lncrna.gtf.antisense.gtf /storage1/fs1/christophermaher/Active/maherlab/sidizhao/circ_rna/long_read/isocirc_rerun/crc_matched_patients/722_primary_pacbio/isocirc_output_short_read/output_0/isocirc.bed.antisense.out == 15:02:55-Feb-09-2023 == [itst_gtf_bed] itst_gtf_bed /storage1/fs1/christophermaher/Active/maherlab/sidizhao/circ_rna/long_read/isocirc_rerun/crc_matched_patients/722_primary_pacbio/isocirc_output_short_read/output_0/isocirc.bed.exon.gtf /storage1/fs1/christophermaher/Active/maherlab/sidizhao/circ_rna/long_read/isocirc_rerun/crc_matched_patients/722_primary_pacbio/isocirc_output_short_read/output_0/hg38_with_maher_lab_lncrna.gtf.rRNA.gtf /storage1/fs1/christophermaher/Active/maherlab/sidizhao/circ_rna/long_read/isocirc_rerun/crc_matched_patients/722_primary_pacbio/isocirc_output_short_read/output_0/isocirc.bed.rRNA.out == 15:02:56-Feb-09-2023 == [itst_intron] bedtools intersect -v -a /storage1/fs1/christophermaher/Active/maherlab/sidizhao/circ_rna/long_read/isocirc_rerun/crc_matched_patients/722_primary_pacbio/isocirc_output_short_read/output_0/isocirc.bed.exon.gtf -b /storage1/fs1/christophermaher/Active/maherlab/sidizhao/circ_rna/long_read/isocirc_rerun/crc_matched_patients/722_primary_pacbio/isocirc_output_short_read/output_0/hg38_with_maher_lab_lncrna.gtf.bed -split > /storage1/fs1/christophermaher/Active/maherlab/sidizhao/circ_rna/long_read/isocirc_rerun/crc_matched_patients/722_primary_pacbio/isocirc_output_short_read/output_0/isocirc.bed.intron.out == 15:02:58-Feb-09-2023 == [itst_intergenic] bedtools intersect -v -a /storage1/fs1/christophermaher/Active/maherlab/sidizhao/circ_rna/long_read/isocirc_rerun/crc_matched_patients/722_primary_pacbio/isocirc_output_short_read/output_0/isocirc.bed.exon.gtf -b /storage1/fs1/christophermaher/Active/maherlab/sidizhao/circ_rna/long_read/isocirc_rerun/crc_matched_patients/722_primary_pacbio/isocirc_output_short_read/output_0/hg38_with_maher_lab_lncrna.gtf.gene.bed > /storage1/fs1/christophermaher/Active/maherlab/sidizhao/circ_rna/long_read/isocirc_rerun/crc_matched_patients/722_primary_pacbio/isocirc_output_short_read/output_0/isocirc.bed.intergenic.out == 15:02:59-Feb-09-2023 == [itst_exon] bedtools intersect -a /storage1/fs1/christophermaher/Active/maherlab/sidizhao/circ_rna/long_read/isocirc_rerun/crc_matched_patients/722_primary_pacbio/isocirc_output_short_read/output_0/isocirc.bed.exon.gtf -b /storage1/fs1/christophermaher/Active/maherlab/sidizhao/circ_rna/long_read/isocirc_rerun/crc_matched_patients/722_primary_pacbio/isocirc_output_short_read/output_0/hg38_with_maher_lab_lncrna.gtf.exon.gtf -wa -wb > /storage1/fs1/christophermaher/Active/maherlab/sidizhao/circ_rna/long_read/isocirc_rerun/crc_matched_patients/722_primary_pacbio/isocirc_output_short_read/output_0/isocirc.bed.exon.out == 15:03:08-Feb-09-2023 == [get_block_anno] No "exon_number" found in record.

I checked back at the previous successful runs' error logs and there wasn't a step [get_block_anno] in it, it just goes from [itst_exon] to [output_isoform_eval]. I know my GTF file does have exon_number and exon_id for most of the transcripts (I don't know if I need to clean up my GTF more? It used to work fine though.)

Here's a few lines:

chr1 ensGene exon 11869 12227 . + . gene_id "ENSG00000223972"; transcript_id "ENST00000456328"; exon_number "1"; exon_id "ENST00000456328.1"; gene_name "ENSG00000223972"; chr1 ensGene exon 12613 12721 . + . gene_id "ENSG00000223972"; transcript_id "ENST00000456328"; exon_number "2"; exon_id "ENST00000456328.2"; gene_name "ENSG00000223972"; chr1 ensGene exon 13221 14409 . + . gene_id "ENSG00000223972"; transcript_id "ENST00000456328"; exon_number "3"; exon_id "ENST00000456328.3"; gene_name "ENSG00000223972";

Also, I notice that some of the steps in [gtf2bed] require "gene_type" or "gene_biotype" in them. Should the GTF file include the biotypes as well?

yangao07 commented 1 year ago

Based on this log, isoCirc is trying to get "exon_number" from "/storage1/fs1/christophermaher/Active/maherlab/sidizhao/circ_rna/long_read/isocirc_rerun/crc_matched_patients/722_primary_pacbio/isocirc_output_short_read/output_0/isocirc.bed.exon.out", which is expected to be like this:

chr16   isocirc exon    66625   66738   .   +   .   gene_id "isocirc0"; transcript_id "isocirc0"; exon_number "1"; exon_id "isocirc0.1";    chr16   havana  exon    66537   66738   .   -   .   gene_id "ENSG00000234769"; gene_version "4"; transcript_id "ENST00000326592"; transcript_version "9"; exon_number "6"; gene_name "WASH4P"; gene_source "havana"; gene_biotype "protein_coding"; transcript_name "WASH4P-001"; transcript_source "havana"; transcript_biotype "protein_coding"; havana_transcript "OTTHUMT00000133175"; havana_transcript_version "2"; exon_id "ENSE00001686309"; exon_version "1"; tag "basic";

For gene_type, it is not required.

sidizhao commented 1 year ago

Here's that file:

output_0$ head isocirc.bed.exon.out chr1 isocirc exon 233198939 233199030 . - . gene_id "isocirc0"; transcript_id "isocirc0"; exon_number "1"; exon_id "isocirc0.1"; chr1 ensGene exon 233198939 233199030 . - . gene_id "ENSG00000135749"; transcript_id "ENST00000258229"; exon_number "20"; exon_id "ENST00000258229.20"; gene_name "ENSG00000135749"; chr1 isocirc exon 233198939 233199030 . - . gene_id "isocirc0"; transcript_id "isocirc0"; exon_number "1"; exon_id "isocirc0.1"; chr1 ensGene exon 233198939 233199030 . - . gene_id "ENSG00000135749"; transcript_id "ENST00000462233"; exon_number "19"; exon_id "ENST00000462233.19"; gene_name "ENSG00000135749"; chr1 isocirc exon 233198939 233199030 . - . gene_id "isocirc0"; transcript_id "isocirc0"; exon_number "1"; exon_id "isocirc0.1"; chr1 ensGene exon 233198939 233199030 . - . gene_id "ENSG00000135749"; transcript_id "ENST00000475463"; exon_number "8"; exon_id "ENST00000475463.8"; gene_name "ENSG00000135749"; chr1 isocirc exon 233198939 233199030 . - . gene_id "isocirc0"; transcript_id "isocirc0"; exon_number "1"; exon_id "isocirc0.1"; chr1 ensGene exon 233198939 233199030 . - . gene_id "ENSG00000135749"; transcript_id "ENST00000488780"; exon_number "7"; exon_id "ENST00000488780.7"; gene_name "ENSG00000135749"; chr1 isocirc exon 233198939 233199030 . - . gene_id "isocirc0"; transcript_id "isocirc0"; exon_number "1"; exon_id "isocirc0.1"; chr1 ensGene exon 233198939 233199030 . - . gene_id "ENSG00000135749"; transcript_id "ENST00000430153"; exon_number "7"; exon_id "ENST00000430153.7"; gene_name "ENSG00000135749"; chr1 isocirc exon 233198939 233199030 . - . gene_id "isocirc0"; transcript_id "isocirc0"; exon_number "1"; exon_id "isocirc0.1"; chr1 ensGene exon 233198967 233199030 . - . gene_id "ENSG00000135749"; transcript_id "ENST00000518351"; exon_number "1"; exon_id "ENST00000518351.1"; gene_name "ENSG00000135749"; chr1 isocirc exon 233198939 233199030 . - . gene_id "isocirc0"; transcript_id "isocirc0"; exon_number "1"; exon_id "isocirc0.1"; chr1 ensGene exon 233199020 233199030 . - . gene_id "ENSG00000135749"; transcript_id "ENST00000517808"; exon_number "1"; exon_id "ENST00000517808.1"; gene_name "ENSG00000135749"; chr1 isocirc exon 233198939 233199030 . - . gene_id "isocirc0"; transcript_id "isocirc0"; exon_number "1"; exon_id "isocirc0.1"; chr1 knownGene exon 233198939 233199030 . - . gene_id "A6NKB5"; transcript_id "ENST00000258229.14"; exon_number "20"; exon_id "ENST00000258229.14.20"; gene_name "A6NKB5"; chr1 isocirc exon 233198939 233199030 . - . gene_id "isocirc0"; transcript_id "isocirc0"; exon_number "1"; exon_id "isocirc0.1"; chr1 knownGene exon 233198939 233199030 . - . gene_id "H0YB15"; transcript_id "ENST00000462233.5"; exon_number "19"; exon_id "ENST00000462233.5.19"; gene_name "H0YB15"; chr1 isocirc exon 233198939 233199030 . - . gene_id "isocirc0"; transcript_id "isocirc0"; exon_number "1"; exon_id "isocirc0.1"; chr1 knownGene exon 233198939 233199030 . - . gene_id "H0YBF4"; transcript_id "ENST00000475463.6"; exon_number "8"; exon_id "ENST00000475463.6.8"; gene_name "H0YBF4";

yangao07 commented 1 year ago

Also, you mentioned this was run successfully until the recent update of v1.0.6. This is weird because nothing has been changed related to this part.

sidizhao commented 1 year ago

So I concatenated more custom entries to the GTF I used for this run, which I checked to have exon_number and exon_id in those entries as well. I am quite confused as well.

yangao07 commented 1 year ago

I see, but there must be several lines that have no "exon_number" so as to cause this error.

sidizhao commented 1 year ago

I think I know where the problem is. So I looked at the exact same circRNA, isocirc1 which was detected in both the old run and the new run. From the old isocirc.out:

isocirc1 chr10 34422057 34422259 NA NA NA 1 202 0 202 N NA False,False NA NA NA False False True +GT/AG True NNC FSM NA NA NA NA NA NA NA NA NA 1 m64043_220730_094118/30345339/ccs

In this case, it seems like it didn't really get a successful annotation but outputted the file anyway. Is there a particular reason why this new run isn't doing the same? I'm looking at the intermediate files of the new run:

$ cat isocirc.bed.ovlp.gene.out isocirc1 G009115 G009115 +

I think G009115 is one of the newer transcripts I added on, which means in the old run it wasn't getting recognized. By searching through the new annotation:

$ grep G009115 hg38_with_maher_lab_lncrna.gtf chr10 mitranscriptome gene 34417023 34459184 . + . gene_id "G009115"; gene_name "Unknown" chr10 mitranscriptome transcript 34417023 34436597 . + . transcript_id "T039819"; gene_id "G009115"; transcript_name "Unknown"; gene_name "Unknown" chr10 mitranscriptome exon 34417023 34417308 . + . transcript_id "T039819"; gene_id "G009115"; transcript_name "Unknown"; gene_name "Unknown" chr10 mitranscriptome transcript 34417023 34459184 . + . transcript_id "T039820"; gene_id "G009115"; transcript_name "Unknown"; gene_name "Unknown" chr10 mitranscriptome exon 34417023 34417308 . + . transcript_id "T039820"; gene_id "G009115"; transcript_name "Unknown"; gene_name "Unknown" chr10 mitranscriptome exon 34435248 34436597 . + . transcript_id "T039819"; gene_id "G009115"; transcript_name "Unknown"; gene_name "Unknown" chr10 mitranscriptome exon 34458756 34459184 . + . transcript_id "T039820"; gene_id "G009115"; transcript_name "Unknown"; gene_name "Unknown"

Do you think if I added exon number and exon id to these transcripts, it'll rectify the problem?

yangao07 commented 1 year ago

Yes, you should try that.

sidizhao commented 1 year ago

Resolved. Thank you! Is there a way to make the short read correct step a separate command? My computer cluster has a hard time running the entire process at once, so I typically end up having to break long_corrected.fa into smaller files and redo the isocirc command without the correction.

yangao07 commented 1 year ago

You can run lordec (or any long-read correction tool) separately if you have matched short-read data to correct the long-read data, and then use the corrected long reads as input.

sidizhao commented 1 year ago

Okay I'll keep that in mind.

Actually I just ran into some small problems. Since I've broken the fasta file up, some of the smaller files aren't finishing the job, whereas some of them did finish and produced results. Here's one example, and it seems that it just gets cut off after [read_wise_eval] started. Is this normal?

Matplotlib created a temporary config/cache directory at /tmp/995513.tmpdir/matplotlib-gp59czes because the default path (/home/s.zhao/.config/matplotlib) is not a writable directory; it is highly recommended to set the MPLCONFIGDIR environment variable to a writable directory, in particular to speed up the import of Matplotlib and to better support multiprocessing. == 11:25:49-Feb-10-2023 == [check_dependencies] Checking dependencies ... == 11:25:49-Feb-10-2023 == [check_dependencies] Checking dependencies done! == 11:25:49-Feb-10-2023 == [Tandem-Repeats-Finder] Finding tandem repeats with TRF ... == 11:25:50-Feb-10-2023 == [Tandem Repeats Finder] trf409.legacylinux64 /storage1/fs1/christophermaher/Active/maherlab/sidizhao/circ_rna/long_read/isocirc_rerun/crc_cell_lines/hct116_FAR20705_nanopore/hct116_FAR20705_nanopore_long_corrected.16.fa 2 7 7 80 10 100 2000 -h -ngs > /storage1/fs1/christophermaher/Active/maherlab/sidizhao/circ_rna/long_read/isocirc_rerun/crc_cell_lines/hct116_FAR20705_nanopore/isocirc_output_short_read/output_16/trf.out == 13:09:15-Feb-10-2023 == [Tandem-Repeats-Finder] Finding tandem repeats with TRF done! == 13:09:15-Feb-10-2023 == [Mapping] Mapping consensus sequence to genome ... == 13:09:15-Feb-10-2023 == [Mapping] minimap2 -ax splice -ub --MD --eqx /storage1/fs1/christophermaher/Active/maherlab/sidizhao/circ_rna/long_read/hct116_pacbio/annotation/all-chrs.fa /storage1/fs1/christophermaher/Active/maherlab/sidizhao/circ_rna/long_read/isocirc_rerun/crc_cell_lines/hct116_FAR20705_nanopore/isocirc_output_short_read/output_16/cons.fa -t 1 > /storage1/fs1/christophermaher/Active/maherlab/sidizhao/circ_rna/long_read/isocirc_rerun/crc_cell_lines/hct116_FAR20705_nanopore/isocirc_output_short_read/output_16/cons.fa.sam [M::mm_idx_gen::107.0280.99] collected minimizers [M::mm_idx_gen::188.9410.99] sorted minimizers [M::main::188.9490.99] loaded/built the index for 455 target sequence(s) [M::mm_mapopt_update::192.5350.99] mid_occ = 792 [M::mm_idx_stat] kmer size: 15; skip: 5; is_hpc: 0; #seq: 455 [M::mm_idx_stat::194.8580.99] distinct minimizers: 167291034 (34.68% are singletons); average occurrences: 6.239; average spacing: 3.075 [M::worker_pipeline::640.6560.99] mapped 101445 sequences [M::main] Version: 2.17-r941 [M::main] CMD: minimap2 -ax splice -ub --MD --eqx -t 1 /storage1/fs1/christophermaher/Active/maherlab/sidizhao/circ_rna/long_read/hct116_pacbio/annotation/all-chrs.fa /storage1/fs1/christophermaher/Active/maherlab/sidizhao/circ_rna/long_read/isocirc_rerun/crc_cell_lines/hct116_FAR20705_nanopore/isocirc_output_short_read/output_16/cons.fa [M::main] Real time: 642.341 sec; CPU: 636.824 sec; Peak RSS: 18.981 GB == 13:19:57-Feb-10-2023 == [Mapping] Mapping consensus sequence to genome done! == 13:19:57-Feb-10-2023 == [Classifying] Classifying consensus alignment ... == 13:19:57-Feb-10-2023 == [classify_bam_core] Processing /storage1/fs1/christophermaher/Active/maherlab/sidizhao/circ_rna/long_read/isocirc_rerun/crc_cell_lines/hct116_FAR20705_nanopore/isocirc_output_short_read/output_16/cons.fa.sam ... == 13:19:59-Feb-10-2023 == [classify_bam_core] 100000 BAM records done ... == 13:20:01-Feb-10-2023 == [classify_bam_core] Processing /storage1/fs1/christophermaher/Active/maherlab/sidizhao/circ_rna/long_read/isocirc_rerun/crc_cell_lines/hct116_FAR20705_nanopore/isocirc_output_short_read/output_16/cons.fa.sam done. == 13:20:01-Feb-10-2023 == [Classifying] Classifying consensus alignment done! == 13:20:03-Feb-10-2023 == [gtfToGenePred] gtfToGenePred -ignoreGroupsWithoutExons /storage1/fs1/christophermaher/Active/maherlab/sidizhao/circ_rna/long_read/isocirc_rerun/annotation/hg38_with_maher_lab_lncrna.with.exon.id.gtf /storage1/fs1/christophermaher/Active/maherlab/sidizhao/circ_rna/long_read/isocirc_rerun/crc_cell_lines/hct116_FAR20705_nanopore/isocirc_output_short_read/output_16/hg38_with_maher_lab_lncrna.with.exon.id.gtf.gene_pred == 13:20:27-Feb-10-2023 == [genePredToBed] genePredToBed /storage1/fs1/christophermaher/Active/maherlab/sidizhao/circ_rna/long_read/isocirc_rerun/crc_cell_lines/hct116_FAR20705_nanopore/isocirc_output_short_read/output_16/hg38_with_maher_lab_lncrna.with.exon.id.gtf.gene_pred /storage1/fs1/christophermaher/Active/maherlab/sidizhao/circ_rna/long_read/isocirc_rerun/crc_cell_lines/hct116_FAR20705_nanopore/isocirc_output_short_read/output_16/hg38_with_maher_lab_lncrna.with.exon.id.gtf.bed == 13:20:30-Feb-10-2023 == [get_transcript_from_gtf] Loading transcript from /storage1/fs1/christophermaher/Active/maherlab/sidizhao/circ_rna/long_read/isocirc_rerun/annotation/hg38_with_maher_lab_lncrna.with.exon.id.gtf ... == 13:20:44-Feb-10-2023 == [get_transcript_from_gtf] Loading transcript from /storage1/fs1/christophermaher/Active/maherlab/sidizhao/circ_rna/long_read/isocirc_rerun/annotation/hg38_with_maher_lab_lncrna.with.exon.id.gtf done! == 13:20:44-Feb-10-2023 == [get_splice_site_from_bed12] Loading splice site from /storage1/fs1/christophermaher/Active/maherlab/sidizhao/circ_rna/long_read/isocirc_rerun/crc_cell_lines/hct116_FAR20705_nanopore/isocirc_output_short_read/output_16/hg38_with_maher_lab_lncrna.with.exon.id.gtf.bed ... [E::idx_find_and_load] Could not retrieve index file for '/storage1/fs1/christophermaher/Active/maherlab/sidizhao/circ_rna/long_read/isocirc_rerun/crc_cell_lines/hct116_FAR20705_nanopore/isocirc_output_short_read/output_16/high.bam' == 13:20:52-Feb-10-2023 == [get_splice_site_from_bed12] Loading splice site from /storage1/fs1/christophermaher/Active/maherlab/sidizhao/circ_rna/long_read/isocirc_rerun/crc_cell_lines/hct116_FAR20705_nanopore/isocirc_output_short_read/output_16/hg38_with_maher_lab_lncrna.with.exon.id.gtf.bed done! == 13:20:52-Feb-10-2023 == [get_splice_junction_from_bed12] Loading splice junction from /storage1/fs1/christophermaher/Active/maherlab/sidizhao/circ_rna/long_read/isocirc_rerun/crc_cell_lines/hct116_FAR20705_nanopore/isocirc_output_short_read/output_16/hg38_with_maher_lab_lncrna.with.exon.id.gtf.bed ... [E::idx_find_and_load] Could not retrieve index file for '/storage1/fs1/christophermaher/Active/maherlab/sidizhao/circ_rna/long_read/isocirc_rerun/crc_cell_lines/hct116_FAR20705_nanopore/isocirc_output_short_read/output_16/high.bam' == 13:20:58-Feb-10-2023 == [get_splice_junction_from_bed12] Loading splice junction from /storage1/fs1/christophermaher/Active/maherlab/sidizhao/circ_rna/long_read/isocirc_rerun/crc_cell_lines/hct116_FAR20705_nanopore/isocirc_output_short_read/output_16/hg38_with_maher_lab_lncrna.with.exon.id.gtf.bed done! == 13:20:58-Feb-10-2023 == [get_exon_from_bed12] Loading exon from /storage1/fs1/christophermaher/Active/maherlab/sidizhao/circ_rna/long_read/isocirc_rerun/crc_cell_lines/hct116_FAR20705_nanopore/isocirc_output_short_read/output_16/hg38_with_maher_lab_lncrna.with.exon.id.gtf.bed ... [E::idx_find_and_load] Could not retrieve index file for '/storage1/fs1/christophermaher/Active/maherlab/sidizhao/circ_rna/long_read/isocirc_rerun/crc_cell_lines/hct116_FAR20705_nanopore/isocirc_output_short_read/output_16/high.bam' == 13:21:06-Feb-10-2023 == [get_exon_from_bed12] Loading exon from /storage1/fs1/christophermaher/Active/maherlab/sidizhao/circ_rna/long_read/isocirc_rerun/crc_cell_lines/hct116_FAR20705_nanopore/isocirc_output_short_read/output_16/hg38_with_maher_lab_lncrna.with.exon.id.gtf.bed done! == 13:21:06-Feb-10-2023 == [get_back_splice_junction_from_bed] Loading splice junction from /storage1/fs1/christophermaher/Active/maherlab/sidizhao/circ_rna/long_read/isocirc_rerun/annotation/HCT116_short_read_annotation.bed ... [E::idx_find_and_load] Could not retrieve index file for '/storage1/fs1/christophermaher/Active/maherlab/sidizhao/circ_rna/long_read/isocirc_rerun/crc_cell_lines/hct116_FAR20705_nanopore/isocirc_output_short_read/output_16/high.bam' == 13:21:06-Feb-10-2023 == [get_back_splice_junction_from_bed] Loading splice junction from /storage1/fs1/christophermaher/Active/maherlab/sidizhao/circ_rna/long_read/isocirc_rerun/annotation/HCT116_short_read_annotation.bed done! [E::idx_find_and_load] Could not retrieve index file for '/storage1/fs1/christophermaher/Active/maherlab/sidizhao/circ_rna/long_read/isocirc_rerun/crc_cell_lines/hct116_FAR20705_nanopore/isocirc_output_short_read/output_16/high.bam' [E::idx_find_and_load] Could not retrieve index file for '/storage1/fs1/christophermaher/Active/maherlab/sidizhao/circ_rna/long_read/isocirc_rerun/crc_cell_lines/hct116_FAR20705_nanopore/isocirc_output_short_read/output_16/low.bam' == 13:21:08-Feb-10-2023 == [read_wise_eval] Generating read-wise evaluation result ... Traceback (most recent call last): File "/usr/local/bin/miniconda3/bin/isocirc", line 219, in main() File "/usr/local/bin/miniconda3/bin/isocirc", line 216, in main isocirc_core(args) File "/usr/local/bin/miniconda3/bin/isocirc", line 132, in isocirc_core hf.hcBSJ_fullIso(high_bam, low_bam, long_len_fn, cons_info, cons_fa, File "/usr/local/bin/miniconda3/lib/python3.10/site-packages/isocirc/hcBSJ_fullIso.py", line 797, in hcBSJ_fullIso eval_core(processed_cnt, r, cons_info_dict, ref_fa, cons_fa, all_site, all_exon, all_sj, circ_sj, sj_xid, key_sj_xid, site_dis, end_dis, all_out) File "/usr/local/bin/miniconda3/lib/python3.10/site-packages/isocirc/hcBSJ_fullIso.py", line 742, in eval_core is_known_bsj, is_cano_bsj, dis_to_cano_bsj, bsj_motif, align_bsj = pg.is_known_cano_bsj(bsj, circ_sj, ref_seq, cons_fa[r.query_name][:].seq.upper(), int(eval_out['startCoor0based']), int(eval_out['endCoor']), r.is_reverse, r.cigartuples, int(eval_out['refMapLen']), int(eval_out['consMapLen']), int(eval_out['consLen']), end_dis, force_strand, bsj_dis_to_known_ss) File "/usr/local/bin/miniconda3/lib/python3.10/site-packages/isocirc/parse_gff.py", line 771, in is_known_cano_bsj score, alignBSJ1 = get_cano_bsj_align(up_dis1, down_dis1, strand1, ref_seq, read_seq, start, end, end_dis, is_reverse, cigartuples, ref_map_len, cons_len) File "/usr/local/bin/miniconda3/lib/python3.10/site-packages/isocirc/parse_gff.py", line 673, in get_cano_bsj_align return pb.pairwise_align(bsj_ref_seq, bsj_read_seq, 'g', True) File "/usr/local/bin/miniconda3/lib/python3.10/site-packages/isocirc/parse_bam.py", line 103, in pairwise_align return res.score, get_cigar_from_pairwise_res(r.format()) File "/usr/local/bin/miniconda3/lib/python3.10/site-packages/isocirc/parse_bam.py", line 80, in get_cigar_from_pairwise_res cigartuples.append((cigar_op_dict[op], 1)) UnboundLocalError: local variable 'op' referenced before assignment

yangao07 commented 1 year ago

This actually looks very weird. Can you upload your data here? Both long reads and annotation file. So that I can try to track this error.

sidizhao commented 1 year ago

Please merge the gtf file as github has size limit. Thank you for being so patient with me!

github_debug00.gtf.gz github_debug01.gtf.gz github_debug06.gtf.gz github_debug05.gtf.gz github_debug04.gtf.gz github_debug03.gtf.gz github_debug02.gtf.gz

hct116_FAR20705_nanopore_long_corrected.14.fa.gz

sidizhao commented 1 year ago

I tried again with the newest push pip install isocirc==1.0.6a0 and the same error persists for this file.

sidizhao commented 1 year ago

Hi, just to follow up on this issue. Were you able to take a look at what could've potentially trigged this error?

yangao07 commented 1 year ago

Which circRNA bed file did you use as input?

sidizhao commented 1 year ago

HCT116_short_read_annotation.bed.zip

yangao07 commented 1 year ago

I don't see the error msg with this command:

isocirc /home/gaoy1/sdata/isocirc_debug/hct116_FAR20705_nanopore_long_corrected.14.fa /home/gaoy1/data/genome/hg38/hg38.fa /home/gaoy1/sdata/isocirc_debug/debug.gtf /home/gaoy1/sdata/isocirc_debug/HCT116_short_read_annotation.bed /home/gaoy1/sdata/isocirc_debug/output -t32

Seems very weird.

sidizhao commented 1 year ago

Yeah I don't quite understand why it would generate an error because other parts of the fasta file have successfully completed running. Do you have an inkling of why that specific "UnboundLocalError: local variable 'op' referenced before assignment" would happen? In the meantime I will also try to ask the IT people maintaining the cluster and see if it's on our end.

yangao07 commented 1 year ago

Can you try to re-install the isocirc from the latest source (not the pip install)? And re-run it on this dataset. I added some error msg related to this error.

sidizhao commented 1 year ago

Alright I'll get back to you. I've been running it on a docker image I built. Will change pip to git and try again.

sidizhao commented 1 year ago

== 22:04:10-Feb-15-2023 == [read_wise_eval] Generating read-wise evaluation result ... == 22:04:10-Feb-15-2023 == [get_cigar_from_pairwise_res] Unexpected alignment string: target TCATAAAACGTTACTTAAAA 0.

It now shows this.

I tried to look for "TCATAAAACGTTACTTAAAA" in any of the intermediate files and it's not showing up.

yangao07 commented 1 year ago

Can you try pip show biopython? Seems like you are using the old version of biopython.

sidizhao commented 1 year ago

Name: biopython Version: 1.78

yangao07 commented 1 year ago

The new version requires biopython >= 1.79. This is why the error come up.

sidizhao commented 1 year ago

Should I specify that when I build the docker? The only ones I had installed other than isocirc were bedtools and minimap2.

yangao07 commented 1 year ago

I am not familiar with docker. Usually, there should be no problem since it is listed in the requirement.txt. You can try to re-install every thing.

sidizhao commented 1 year ago

Yeah I think the docker image is still pulling the local 1.78 version for some reason. I'm working on fixing that. Hopefully this will fix everything.