Closed yerry77 closed 1 month ago
Hi @yerry77 thanks for reporting this problem
That error seems to occurr when tryiing to read the *corrected.gtf generated by sqanti3_qc.py itself, by parsing the start position, end position and, specifically, the strand. However, with my gtf files, I'm unable to replicate it.
I see that you are using Sqanti3 v5.2.1. We have an updated v5.2.2 version that i recommend you to test to check that this error still happens in the latest version.
Can you please share the corrected gtf or check that the *corrected.gtf does exists, it's not empty and is correctly formatted? It seems to be related to reading and parsing that file.
Thanks
I have confirmed that the *corrected.gtf exists, but the file format is different from other gtf files. It lacks the line with feature as gene, that is, it lacks gene information. Is this the reason? But when I ran this file before, the output gtf file format also did not have the line with feature as gene. Here, my input gtf file does have the line with feature as gene, and this file is formed by merging the data obtained from short-read sequencing.
Lacking the gene feature is a expected behavior, so i don't think that is the problem here. The gtf file should have 9 columns, like the little example I attach. example_corrected.txt
Hi @yerry77
Thanks to #334 we noticed a bug regarding parsing of this file, when the transcripts do no have an strand assigned, which used StringTie data, that makes sqanti fail. Do all of your transcripts have a '+' or '-' in the 6th column, or there are some that have a dot '.'?
I'm working on a fix for that error, so I wanted to know if this also aplies to your data
Regards
Is there an existing issue for this?
Have you loaded the SQANTI3.env conda environment?
Problem description
I have remove the '*'strand transcripts,but it still cannot work.
Code sample
conda activate SQANTI3.env
export PYTHONPATH=$PYTHONPATH:/data/p/SQANTI3/cDNA_Cupcake/sequence/ export PYTHONPATH=$PYTHONPATH:/data/p/SQANTI3/cDNA_Cupcake/
python /data/p/SQANTI3/SQANTI3-5.2.1/sqanti3_qc_2.py \ /data1/x/partners/LiFengXian/20231224PPG/20240626PPG2/20240912short_reads/20240913merge_gtf/filtered_gtf.gtf \ /data1/pub/genome/Human/humanGENCODE/gencode.v46.annotation.gtf \ /data1/pub/genome/Human/humanGENCODE/GRCh38.p14.genome.fa \ --CAGE_peak /data1/x/partners/LiFengXian/20231224PPG/raw/20240328SQANTI3/refTSS_v3.3_mouse_coordinate.mm10.bed \ --polyA_motif_list /data/p/SQANTI3/SQANTI3/data/polyA_motifs/mouse_and_human.polyA_motif.txt \ -o PPG \ -d /XCLabServer003_fastIO/20240918SQANTI3/ \ --cpus 80 \ --report both \ --short_reads /XCLabServer003_fastIO/20240918SQANTI3/PPG_short_reads.fofn
Error
Error corrected FASTA /XCLabServer003_fastIO/20240918SQANTI3/PPG_corrected.fasta already exists. Using it... Predicting ORF sequences... ORF file /XCLabServer003_fastIO/20240918SQANTI3/PPG_corrected.faa already exists. Using it.... Parsing Reference Transcriptome.... /XCLabServer003_fastIO/20240918SQANTI3/refAnnotation_PPG.genePred already exists. Using it. **** Parsing Isoforms.... Running calculation of TSS ratio Traceback (most recent call last): File "/data/p/SQANTI3/SQANTI3-5.2.1/sqanti3_qc_2.py", line 2577, in
main()
File "/data/p/SQANTI3/SQANTI3-5.2.1/sqanti3_qc_2.py", line 2560, in main
run(args)
File "/data/p/SQANTI3/SQANTI3-5.2.1/sqanti3_qc_2.py", line 1875, in run
isoforms_info, ratio_TSS_dict = isoformClassification(args, isoforms_by_chr, refs_1exon_by_chr, refs_exons_by_chr, junctions_by_chr, junctions_by_gene, start_ends_by_gene, genome_dict, indelsJunc, orfDict, corrGTF, star_out, star_index, SJcovNames, SJcovInfo)
File "/data/p/SQANTI3/SQANTI3-5.2.1/sqanti3_qc_2.py", line 1545, in isoformClassification
inside_bed, outside_bed = get_TSS_bed(corrGTF, chr_order)
File "/data/p/SQANTI3/SQANTI3-5.2.1/utilities/short_reads.py", line 127, in get_TSS_bed
strand=str(loc[2])
IndexError: list index out of range
Anything else?
No response