Have you loaded the SQANTI3.env conda environment?
[X] I have loaded the SQANTI3.env conda environment
Problem description
I'm annotating a plant genome using IsoSeq reads. The prior annotation was short read only, so this is a significant improvement.
I'm trying to run sqanti3_rescue, but it crashes before it finishes.
The error says that the transcript_id and gene_id tags are missing, but I've checked both input .gft files, and they are present:
Could you help me figure out what I'm doing wrong?
Note that I used the github sqanti conda environment, as the tarball was not the most recent version.
I've included all my sqanti3 analysis commands and screen printouts in Sqanti_run_Aug_2.txt, in case this is a problem caused by an earlier step.
Code sample
~/bin/SQANTI3/sqanti3_rescue.py ml --isoforms Isoseq_corrected.fasta --gtf Isoseq-filtered.filtered.gtf -g B_napus.gtf -f B_napus.fasta -k B_napus_classification.txt --mode full -e all -o Isoseq-rescued -r randomforest.RData -j 0.7 Isoseq-filtered_MLresult_classification.txt
Error
RETRIEVING RESCUE TARGETS...
Rescue targets: validated LR or reference isoforms that could replace an artifact from the same gene.
Retrieving target genes...
Finding target isoforms from long read transcriptome...
Finding target isoforms from reference transcriptome...
Error in check_tag_present(c(transcript_id, gene_id), tags, error = TRUE) :
Tags transcript_id, gene_id are absent from the attribute field.
Calls: -> tr2g_GRanges -> check_tag_present
Execution halted
Traceback (most recent call last):
File "/home/AGR.GC.CA/coutuc/bin/SQANTI3/sqanti3_rescue.py", line 660, in
main()
File "/home/AGR.GC.CA/coutuc/bin/SQANTI3/sqanti3_rescue.py", line 557, in main
auto_result = run_automatic_rescue(args)
^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/AGR.GC.CA/coutuc/bin/SQANTI3/sqanti3_rescue.py", line 59, in run_automatic_rescue
if subprocess.check_call(auto_cmd, shell = True) != 0:
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/AGR.GC.CA/coutuc/.conda/envs/SQANTI3.env/lib/python3.11/subprocess.py", line 413, in check_call
raise CalledProcessError(retcode, cmd)
sqanti3_rescue.py works with the same command in --mode automatic, just not in --mode full.
Also, I've now tested this and the error is present when using ml filtering or rules filtering.
Is there an existing issue for this?
Have you loaded the SQANTI3.env conda environment?
Problem description
I'm annotating a plant genome using IsoSeq reads. The prior annotation was short read only, so this is a significant improvement. I'm trying to run sqanti3_rescue, but it crashes before it finishes.
The error says that the transcript_id and gene_id tags are missing, but I've checked both input .gft files, and they are present:
Isoseq-filtered.filtered.gtf Bna.4DH.A01 PacBio transcript 20335 21285 . + . transcript_id "PB.2.1"; gene_id "BnaA01g000050.4DH"; Bna.4DH.A01 PacBio exon 20335 21285 . + . transcript_id "PB.2.1"; gene_id "BnaA01g000050.4DH"; Bna.4DH.A01 PacBio transcript 21075 22955 . - . transcript_id "PB.4.1"; gene_id "BnaA01g000070.4DH_BnaA01g000060.4DH"; Bna.4DH.A01 PacBio exon 21075 21381 . - . transcript_id "PB.4.1"; gene_id "BnaA01g000070.4DH_BnaA01g000060.4DH"; Bna.4DH.A01 PacBio exon 21450 21674 . - . transcript_id "PB.4.1"; gene_id "BnaA01g000070.4DH_BnaA01g000060.4DH"; Bna.4DH.A01 PacBio exon 21770 21907 . - . transcript_id "PB.4.1"; gene_id "BnaA01g000070.4DH_BnaA01g000060.4DH"; Bna.4DH.A01 PacBio exon 21995 22257 . - . transcript_id "PB.4.1"; gene_id "BnaA01g000070.4DH_BnaA01g000060.4DH"; Bna.4DH.A01 PacBio exon 22343 22955 . - . transcript_id "PB.4.1"; gene_id "BnaA01g000070.4DH_BnaA01g000060.4DH"; Bna.4DH.A01 PacBio CDS 21277 21381 . - . transcript_id "PB.4.1"; gene_id "BnaA01g000070.4DH_BnaA01g000060.4DH"; Bna.4DH.A01 PacBio CDS 21450 21674 . - . transcript_id "PB.4.1"; gene_id "BnaA01g000070.4DH_BnaA01g000060.4DH"; Bna.4DH.A01 PacBio CDS 21770 21907 . - . transcript_id "PB.4.1"; gene_id "BnaA01g000070.4DH_BnaA01g000060.4DH"; Bna.4DH.A01 PacBio CDS 21995 22257 . - . transcript_id "PB.4.1"; gene_id "BnaA01g000070.4DH_BnaA01g000060.4DH"; Bna.4DH.A01 PacBio CDS 22343 22910 . - . transcript_id "PB.4.1"; gene_id "BnaA01g000070.4DH_BnaA01g000060.4DH"; ...
B_napus.gtf Bna.4DH.A01 AAFC_GIFS gene 3830 6473 . - . gene_id "BnaA01g000010.4DH"; gene_name "BnaA01g000010.4DH" Bna.4DH.A01 AAFC_GIFS mRNA 3830 6473 . - . transcript_id "BnaA01g000010.4DH.1"; gene_id "BnaA01g000010.4DH"; gene_name "BnaA01g000010.4DH" Bna.4DH.A01 AAFC_GIFS exon 6456 6473 . - . transcript_id "BnaA01g000010.4DH.1"; gene_id "BnaA01g000010.4DH"; gene_name "BnaA01g000010.4DH" Bna.4DH.A01 AAFC_GIFS exon 6025 6348 . - . transcript_id "BnaA01g000010.4DH.1"; gene_id "BnaA01g000010.4DH"; gene_name "BnaA01g000010.4DH" Bna.4DH.A01 AAFC_GIFS exon 5572 5703 . - . transcript_id "BnaA01g000010.4DH.1"; gene_id "BnaA01g000010.4DH"; gene_name "BnaA01g000010.4DH" Bna.4DH.A01 AAFC_GIFS exon 5090 5387 . - . transcript_id "BnaA01g000010.4DH.1"; gene_id "BnaA01g000010.4DH"; gene_name "BnaA01g000010.4DH" Bna.4DH.A01 AAFC_GIFS exon 4845 4977 . - . transcript_id "BnaA01g000010.4DH.1"; gene_id "BnaA01g000010.4DH"; gene_name "BnaA01g000010.4DH" Bna.4DH.A01 AAFC_GIFS exon 4569 4768 . - . transcript_id "BnaA01g000010.4DH.1"; gene_id "BnaA01g000010.4DH"; gene_name "BnaA01g000010.4DH" Bna.4DH.A01 AAFC_GIFS exon 3830 4185 . - . transcript_id "BnaA01g000010.4DH.1"; gene_id "BnaA01g000010.4DH"; gene_name "BnaA01g000010.4DH" Bna.4DH.A01 AAFC_GIFS CDS 6456 6473 . - 0 transcript_id "BnaA01g000010.4DH.1"; gene_id "BnaA01g000010.4DH"; gene_name "BnaA01g000010.4DH" Bna.4DH.A01 AAFC_GIFS CDS 6025 6348 . - 0 transcript_id "BnaA01g000010.4DH.1"; gene_id "BnaA01g000010.4DH"; gene_name "BnaA01g000010.4DH" Bna.4DH.A01 AAFC_GIFS CDS 5572 5703 . - 0 transcript_id "BnaA01g000010.4DH.1"; gene_id "BnaA01g000010.4DH"; gene_name "BnaA01g000010.4DH" Bna.4DH.A01 AAFC_GIFS CDS 5090 5387 . - 0 transcript_id "BnaA01g000010.4DH.1"; gene_id "BnaA01g000010.4DH"; gene_name "BnaA01g000010.4DH" Bna.4DH.A01 AAFC_GIFS CDS 4845 4977 . - 2 transcript_id "BnaA01g000010.4DH.1"; gene_id "BnaA01g000010.4DH"; gene_name "BnaA01g000010.4DH" Bna.4DH.A01 AAFC_GIFS CDS 4569 4768 . - 1 transcript_id "BnaA01g000010.4DH.1"; gene_id "BnaA01g000010.4DH"; gene_name "BnaA01g000010.4DH" Bna.4DH.A01 AAFC_GIFS CDS 3830 4185 . - 2 transcript_id "BnaA01g000010.4DH.1"; gene_id "BnaA01g000010.4DH"; gene_name "BnaA01g000010.4DH" Bna.4DH.A01 AAFC_GIFS mRNA 3830 4969 . - . transcript_id "BnaA01g000010.4DH.2"; gene_id "BnaA01g000010.4DH"; gene_name "BnaA01g000010.4DH" Bna.4DH.A01 AAFC_GIFS exon 4845 4969 . - . transcript_id "BnaA01g000010.4DH.2"; gene_id "BnaA01g000010.4DH"; gene_name "BnaA01g000010.4DH" Bna.4DH.A01 AAFC_GIFS exon 4569 4768 . - . transcript_id "BnaA01g000010.4DH.2"; gene_id "BnaA01g000010.4DH"; gene_name "BnaA01g000010.4DH" Bna.4DH.A01 AAFC_GIFS exon 3830 4185 . - . transcript_id "BnaA01g000010.4DH.2"; gene_id "BnaA01g000010.4DH"; gene_name "BnaA01g000010.4DH" Bna.4DH.A01 AAFC_GIFS CDS 4845 4969 . - 0 transcript_id "BnaA01g000010.4DH.2"; gene_id "BnaA01g000010.4DH"; gene_name "BnaA01g000010.4DH" Bna.4DH.A01 AAFC_GIFS CDS 4569 4768 . - 1 transcript_id "BnaA01g000010.4DH.2"; gene_id "BnaA01g000010.4DH"; gene_name "BnaA01g000010.4DH" Bna.4DH.A01 AAFC_GIFS CDS 3830 4185 . - 2 transcript_id "BnaA01g000010.4DH.2"; gene_id "BnaA01g000010.4DH"; gene_name "BnaA01g000010.4DH" Bna.4DH.A01 AAFC_GIFS mRNA 5992 6473 . - . transcript_id "BnaA01g000010.4DH.3"; gene_id "BnaA01g000010.4DH"; gene_name "BnaA01g000010.4DH" Bna.4DH.A01 AAFC_GIFS exon 6456 6473 . - . transcript_id "BnaA01g000010.4DH.3"; gene_id "BnaA01g000010.4DH"; gene_name "BnaA01g000010.4DH" Bna.4DH.A01 AAFC_GIFS exon 5992 6348 . - . transcript_id "BnaA01g000010.4DH.3"; gene_id "BnaA01g000010.4DH"; gene_name "BnaA01g000010.4DH" Bna.4DH.A01 AAFC_GIFS CDS 6456 6473 . - 0 transcript_id "BnaA01g000010.4DH.3"; gene_id "BnaA01g000010.4DH"; gene_name "BnaA01g000010.4DH" Bna.4DH.A01 AAFC_GIFS CDS 5992 6348 . - 0 transcript_id "BnaA01g000010.4DH.3"; gene_id "BnaA01g000010.4DH"; gene_name "BnaA01g000010.4DH" ...
Could you help me figure out what I'm doing wrong? Note that I used the github sqanti conda environment, as the tarball was not the most recent version.
I've included all my sqanti3 analysis commands and screen printouts in Sqanti_run_Aug_2.txt, in case this is a problem caused by an earlier step.
Code sample
~/bin/SQANTI3/sqanti3_rescue.py ml --isoforms Isoseq_corrected.fasta --gtf Isoseq-filtered.filtered.gtf -g B_napus.gtf -f B_napus.fasta -k B_napus_classification.txt --mode full -e all -o Isoseq-rescued -r randomforest.RData -j 0.7 Isoseq-filtered_MLresult_classification.txt
Error
Error in check_tag_present(c(transcript_id, gene_id), tags, error = TRUE) : Tags transcript_id, gene_id are absent from the attribute field. Calls: -> tr2g_GRanges -> check_tag_present
Execution halted
Traceback (most recent call last):
File "/home/AGR.GC.CA/coutuc/bin/SQANTI3/sqanti3_rescue.py", line 660, in
main()
File "/home/AGR.GC.CA/coutuc/bin/SQANTI3/sqanti3_rescue.py", line 557, in main
auto_result = run_automatic_rescue(args)
^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/AGR.GC.CA/coutuc/bin/SQANTI3/sqanti3_rescue.py", line 59, in run_automatic_rescue
if subprocess.check_call(auto_cmd, shell = True) != 0:
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/AGR.GC.CA/coutuc/.conda/envs/SQANTI3.env/lib/python3.11/subprocess.py", line 413, in check_call
raise CalledProcessError(retcode, cmd)
Anything else?
Sqanti_run_Aug_2.txt