Closed Ellenski closed 1 year ago
Specification do not allow -2
as frame in GTF and GFF. Only '0', '1', '2' or '.' are allowed. See https://agat.readthedocs.io/en/latest/gxf.html#main-points-and-differences-between-gff-formats
So check your column 8 and replace any value out of this scope by .
You can decide to remove all value from 8th column and replace by .
If you really need this information you need your fasta file, AGAT can re-compute it for you via agat_sp_fix_cds_phases.pl
Hi Jacques! Thanks for the information. I checked my GTF when I got the error message and could not find any instances of "-2" in column 8 so it is very strange that I am getting the error. I will double check the file and try using the command you suggested. Let me know if you have any other thoughts.
Hi again. So, I'm still having trouble with the above error message. I went into my GTF file and confirmed that none of the entries in column 8 have a value of "-2". Then, just to be safe, I forced all column 8 entries to be "." . This resulted in the same error message as before when running _agat_sp_extractsequences.pl . I also looked into the _agat_sp_fix_cdsphases.pl command that you suggested, but running this produced the same error message (MSG: '-2' is not a valid frame) and a blank file. This makes me think that whatever the error is, it is not stemming from column 8 in the GTF and as mentioned before, none of my other GTF files that were created using the same pipeline as the one I am currently working with hav encountered this issue. Do you have any other suggestions of what I might try. Thank you so much for your help.
Then it might be indeed an error introduced by agat. Could you share the file that gives this error?
Sure thing! I emailed you the files.
Your case is very peculiar.
First (not directly related to the problem) I don't get why you use --remove_orf_offset
for the extraction. If you have a functional ORF you are pretty sure it would break it (It remove the offset from each piece of the CDS). Try one gene using the --split
parameter for better understanding (once with --remove_orf_offset
and one wihtout).
Second, did you use -polish
when using liftoff? Because the problematic case is the following:
Chr11_NC_045647.1_20541412_21193702 Liftoff transcript 553066 579467 . - . gene_id "ATG7"; transcript_id "XM_041718735.1"; db_xref "GeneID:100229797"; gbkey "mRNA"; gene "ATG7"; model_evidence "Supporting evidence includes similarity to: 3 ESTs, 6 Proteins, and 100% coverage of the annotated genomic feature by RNAseq alignments, including 13 samples with support for all annotated introns"; product "autophagy related 7, transcript variant X4"; transcript_biotype "mRNA"; Parent "ATG7"; matches_ref_protein "False"; valid_ORF "False"; missing_start_codon "True"; extra_copy_number "0";
Chr11_NC_045647.1_20541412_21193702 Liftoff exon 553066 553260 . - . gene_id "ATG7"; transcript_id "XM_041718735.1"; db_xref "GeneID:100229797"; gene "ATG7"; model_evidence "Supporting evidence includes similarity to: 3 ESTs, 6 Proteins, and 100% coverage of the annotated genomic feature by RNAseq alignments, including 13 samples with support for all annotated introns"; product "autophagy related 7, transcript variant X4"; transcript_biotype "mRNA"; exon_number "18"; Parent "XM_041718735.1"; ID "exon_437101"; extra_copy_number "0";
Chr11_NC_045647.1_20541412_21193702 Liftoff exon 553751 553873 . - . gene_id "ATG7"; transcript_id "XM_041718735.1"; db_xref "GeneID:100229797"; gene "ATG7"; model_evidence "Supporting evidence includes similarity to: 3 ESTs, 6 Proteins, and 100% coverage of the annotated genomic feature by RNAseq alignments, including 13 samples with support for all annotated introns"; product "autophagy related 7, transcript variant X4"; transcript_biotype "mRNA"; exon_number "17"; Parent "XM_041718735.1"; ID "exon_437100"; extra_copy_number "0";
Chr11_NC_045647.1_20541412_21193702 Liftoff exon 559329 559409 . - . gene_id "ATG7"; transcript_id "XM_041718735.1"; db_xref "GeneID:100229797"; gene "ATG7"; model_evidence "Supporting evidence includes similarity to: 3 ESTs, 6 Proteins, and 100% coverage of the annotated genomic feature by RNAseq alignments, including 13 samples with support for all annotated introns"; product "autophagy related 7, transcript variant X4"; transcript_biotype "mRNA"; exon_number "16"; Parent "XM_041718735.1"; ID "exon_437099"; extra_copy_number "0";
Chr11_NC_045647.1_20541412_21193702 Liftoff exon 562384 562459 . - . gene_id "ATG7"; transcript_id "XM_041718735.1"; db_xref "GeneID:100229797"; gene "ATG7"; model_evidence "Supporting evidence includes similarity to: 3 ESTs, 6 Proteins, and 100% coverage of the annotated genomic feature by RNAseq alignments, including 13 samples with support for all annotated introns"; product "autophagy related 7, transcript variant X4"; transcript_biotype "mRNA"; exon_number "15"; Parent "XM_041718735.1"; ID "exon_437098"; extra_copy_number "0";
Chr11_NC_045647.1_20541412_21193702 Liftoff exon 563153 563268 . - . gene_id "ATG7"; transcript_id "XM_041718735.1"; db_xref "GeneID:100229797"; gene "ATG7"; model_evidence "Supporting evidence includes similarity to: 3 ESTs, 6 Proteins, and 100% coverage of the annotated genomic feature by RNAseq alignments, including 13 samples with support for all annotated introns"; product "autophagy related 7, transcript variant X4"; transcript_biotype "mRNA"; exon_number "14"; Parent "XM_041718735.1"; ID "exon_437097"; extra_copy_number "0";
Chr11_NC_045647.1_20541412_21193702 Liftoff exon 563522 563725 . - . gene_id "ATG7"; transcript_id "XM_041718735.1"; db_xref "GeneID:100229797"; gene "ATG7"; model_evidence "Supporting evidence includes similarity to: 3 ESTs, 6 Proteins, and 100% coverage of the annotated genomic feature by RNAseq alignments, including 13 samples with support for all annotated introns"; product "autophagy related 7, transcript variant X4"; transcript_biotype "mRNA"; exon_number "13"; Parent "XM_041718735.1"; ID "exon_437096"; extra_copy_number "0";
Chr11_NC_045647.1_20541412_21193702 Liftoff exon 571157 571315 . - . gene_id "ATG7"; transcript_id "XM_041718735.1"; db_xref "GeneID:100229797"; gene "ATG7"; model_evidence "Supporting evidence includes similarity to: 3 ESTs, 6 Proteins, and 100% coverage of the annotated genomic feature by RNAseq alignments, including 13 samples with support for all annotated introns"; product "autophagy related 7, transcript variant X4"; transcript_biotype "mRNA"; exon_number "11"; Parent "XM_041718735.1"; ID "exon_437094"; extra_copy_number "0";
Chr11_NC_045647.1_20541412_21193702 Liftoff exon 571926 572070 . - . gene_id "ATG7"; transcript_id "XM_041718735.1"; db_xref "GeneID:100229797"; gene "ATG7"; model_evidence "Supporting evidence includes similarity to: 3 ESTs, 6 Proteins, and 100% coverage of the annotated genomic feature by RNAseq alignments, including 13 samples with support for all annotated introns"; product "autophagy related 7, transcript variant X4"; transcript_biotype "mRNA"; exon_number "10"; Parent "XM_041718735.1"; ID "exon_437093"; extra_copy_number "0";
Chr11_NC_045647.1_20541412_21193702 Liftoff exon 573818 573908 . - . gene_id "ATG7"; transcript_id "XM_041718735.1"; db_xref "GeneID:100229797"; gene "ATG7"; model_evidence "Supporting evidence includes similarity to: 3 ESTs, 6 Proteins, and 100% coverage of the annotated genomic feature by RNAseq alignments, including 13 samples with support for all annotated introns"; product "autophagy related 7, transcript variant X4"; transcript_biotype "mRNA"; exon_number "9"; Parent "XM_041718735.1"; ID "exon_437092"; extra_copy_number "0";
Chr11_NC_045647.1_20541412_21193702 Liftoff exon 574872 574993 . - . gene_id "ATG7"; transcript_id "XM_041718735.1"; db_xref "GeneID:100229797"; gene "ATG7"; model_evidence "Supporting evidence includes similarity to: 3 ESTs, 6 Proteins, and 100% coverage of the annotated genomic feature by RNAseq alignments, including 13 samples with support for all annotated introns"; product "autophagy related 7, transcript variant X4"; transcript_biotype "mRNA"; exon_number "8"; Parent "XM_041718735.1"; ID "exon_437091"; extra_copy_number "0";
Chr11_NC_045647.1_20541412_21193702 Liftoff exon 576044 576132 . - . gene_id "ATG7"; transcript_id "XM_041718735.1"; db_xref "GeneID:100229797"; gene "ATG7"; model_evidence "Supporting evidence includes similarity to: 3 ESTs, 6 Proteins, and 100% coverage of the annotated genomic feature by RNAseq alignments, including 13 samples with support for all annotated introns"; product "autophagy related 7, transcript variant X4"; transcript_biotype "mRNA"; exon_number "7"; Parent "XM_041718735.1"; ID "exon_437090"; extra_copy_number "0";
Chr11_NC_045647.1_20541412_21193702 Liftoff exon 576698 576847 . - . gene_id "ATG7"; transcript_id "XM_041718735.1"; db_xref "GeneID:100229797"; gene "ATG7"; model_evidence "Supporting evidence includes similarity to: 3 ESTs, 6 Proteins, and 100% coverage of the annotated genomic feature by RNAseq alignments, including 13 samples with support for all annotated introns"; product "autophagy related 7, transcript variant X4"; transcript_biotype "mRNA"; exon_number "6"; Parent "XM_041718735.1"; ID "exon_437089"; extra_copy_number "0";
Chr11_NC_045647.1_20541412_21193702 Liftoff exon 577804 577920 . - . gene_id "ATG7"; transcript_id "XM_041718735.1"; db_xref "GeneID:100229797"; gene "ATG7"; model_evidence "Supporting evidence includes similarity to: 3 ESTs, 6 Proteins, and 100% coverage of the annotated genomic feature by RNAseq alignments, including 13 samples with support for all annotated introns"; product "autophagy related 7, transcript variant X4"; transcript_biotype "mRNA"; exon_number "5"; Parent "XM_041718735.1"; ID "exon_437088"; extra_copy_number "0";
Chr11_NC_045647.1_20541412_21193702 Liftoff exon 578482 578559 . - . gene_id "ATG7"; transcript_id "XM_041718735.1"; db_xref "GeneID:100229797"; gene "ATG7"; model_evidence "Supporting evidence includes similarity to: 3 ESTs, 6 Proteins, and 100% coverage of the annotated genomic feature by RNAseq alignments, including 13 samples with support for all annotated introns"; product "autophagy related 7, transcript variant X4"; transcript_biotype "mRNA"; exon_number "4"; Parent "XM_041718735.1"; ID "exon_437087"; extra_copy_number "0";
Chr11_NC_045647.1_20541412_21193702 Liftoff exon 579350 579467 . - . gene_id "ATG7"; transcript_id "XM_041718735.1"; db_xref "GeneID:100229797"; gene "ATG7"; model_evidence "Supporting evidence includes similarity to: 3 ESTs, 6 Proteins, and 100% coverage of the annotated genomic feature by RNAseq alignments, including 13 samples with support for all annotated introns"; product "autophagy related 7, transcript variant X4"; transcript_biotype "mRNA"; exon_number "3"; Parent "XM_041718735.1"; ID "exon_437086"; extra_copy_number "0";
Chr11_NC_045647.1_20541412_21193702 Liftoff CDS 553751 553873 . - . gene_id "ATG7"; transcript_id "XM_041718735.1"; db_xref "GeneID:100229797"; gbkey "CDS"; gene "ATG7"; product "ubiquitin-like modifier-activating enzyme ATG7 isoform X3"; protein_id "XP_041574669.1"; exon_number "17"; Parent "XM_041718735.1"; ID "CDS_388506"; extra_copy_number "0";
Chr11_NC_045647.1_20541412_21193702 Liftoff CDS 559329 559409 . - . gene_id "ATG7"; transcript_id "XM_041718735.1"; db_xref "GeneID:100229797"; gbkey "CDS"; gene "ATG7"; product "ubiquitin-like modifier-activating enzyme ATG7 isoform X3"; protein_id "XP_041574669.1"; exon_number "16"; Parent "XM_041718735.1"; ID "CDS_388505"; extra_copy_number "0";
Chr11_NC_045647.1_20541412_21193702 Liftoff CDS 562384 562459 . - . gene_id "ATG7"; transcript_id "XM_041718735.1"; db_xref "GeneID:100229797"; gbkey "CDS"; gene "ATG7"; product "ubiquitin-like modifier-activating enzyme ATG7 isoform X3"; protein_id "XP_041574669.1"; exon_number "15"; Parent "XM_041718735.1"; ID "CDS_388504"; extra_copy_number "0";
Chr11_NC_045647.1_20541412_21193702 Liftoff CDS 563153 563268 . - . gene_id "ATG7"; transcript_id "XM_041718735.1"; db_xref "GeneID:100229797"; gbkey "CDS"; gene "ATG7"; product "ubiquitin-like modifier-activating enzyme ATG7 isoform X3"; protein_id "XP_041574669.1"; exon_number "14"; Parent "XM_041718735.1"; ID "CDS_388503"; extra_copy_number "0";
Chr11_NC_045647.1_20541412_21193702 Liftoff CDS 563522 563725 . - . gene_id "ATG7"; transcript_id "XM_041718735.1"; db_xref "GeneID:100229797"; gbkey "CDS"; gene "ATG7"; product "ubiquitin-like modifier-activating enzyme ATG7 isoform X3"; protein_id "XP_041574669.1"; exon_number "13"; Parent "XM_041718735.1"; ID "CDS_388502"; extra_copy_number "0";
Chr11_NC_045647.1_20541412_21193702 Liftoff CDS 571157 571315 . - . gene_id "ATG7"; transcript_id "XM_041718735.1"; db_xref "GeneID:100229797"; gbkey "CDS"; gene "ATG7"; product "ubiquitin-like modifier-activating enzyme ATG7 isoform X3"; protein_id "XP_041574669.1"; exon_number "11"; Parent "XM_041718735.1"; ID "CDS_388500"; extra_copy_number "0";
Chr11_NC_045647.1_20541412_21193702 Liftoff CDS 571926 572070 . - . gene_id "ATG7"; transcript_id "XM_041718735.1"; db_xref "GeneID:100229797"; gbkey "CDS"; gene "ATG7"; product "ubiquitin-like modifier-activating enzyme ATG7 isoform X3"; protein_id "XP_041574669.1"; exon_number "10"; Parent "XM_041718735.1"; ID "CDS_388499"; extra_copy_number "0";
Chr11_NC_045647.1_20541412_21193702 Liftoff CDS 573818 573908 . - . gene_id "ATG7"; transcript_id "XM_041718735.1"; db_xref "GeneID:100229797"; gbkey "CDS"; gene "ATG7"; product "ubiquitin-like modifier-activating enzyme ATG7 isoform X3"; protein_id "XP_041574669.1"; exon_number "9"; Parent "XM_041718735.1"; ID "CDS_388498"; extra_copy_number "0";
Chr11_NC_045647.1_20541412_21193702 Liftoff CDS 574872 574993 . - . gene_id "ATG7"; transcript_id "XM_041718735.1"; db_xref "GeneID:100229797"; gbkey "CDS"; gene "ATG7"; product "ubiquitin-like modifier-activating enzyme ATG7 isoform X3"; protein_id "XP_041574669.1"; exon_number "8"; Parent "XM_041718735.1"; ID "CDS_388497"; extra_copy_number "0";
Chr11_NC_045647.1_20541412_21193702 Liftoff CDS 576044 576132 . - . gene_id "ATG7"; transcript_id "XM_041718735.1"; db_xref "GeneID:100229797"; gbkey "CDS"; gene "ATG7"; product "ubiquitin-like modifier-activating enzyme ATG7 isoform X3"; protein_id "XP_041574669.1"; exon_number "7"; Parent "XM_041718735.1"; ID "CDS_388496"; extra_copy_number "0";
Chr11_NC_045647.1_20541412_21193702 Liftoff CDS 576698 576847 . - . gene_id "ATG7"; transcript_id "XM_041718735.1"; db_xref "GeneID:100229797"; gbkey "CDS"; gene "ATG7"; product "ubiquitin-like modifier-activating enzyme ATG7 isoform X3"; protein_id "XP_041574669.1"; exon_number "6"; Parent "XM_041718735.1"; ID "CDS_388495"; extra_copy_number "0";
Chr11_NC_045647.1_20541412_21193702 Liftoff CDS 577804 577920 . - . gene_id "ATG7"; transcript_id "XM_041718735.1"; db_xref "GeneID:100229797"; gbkey "CDS"; gene "ATG7"; product "ubiquitin-like modifier-activating enzyme ATG7 isoform X3"; protein_id "XP_041574669.1"; exon_number "5"; Parent "XM_041718735.1"; ID "CDS_388494"; extra_copy_number "0";
Chr11_NC_045647.1_20541412_21193702 Liftoff CDS 578482 578559 . - . gene_id "ATG7"; transcript_id "XM_041718735.1"; db_xref "GeneID:100229797"; gbkey "CDS"; gene "ATG7"; product "ubiquitin-like modifier-activating enzyme ATG7 isoform X3"; protein_id "XP_041574669.1"; exon_number "4"; Parent "XM_041718735.1"; ID "CDS_388493"; extra_copy_number "0";
Chr11_NC_045647.1_20541412_21193702 Liftoff CDS 579350 579467 . - . gene_id "ATG7"; transcript_id "XM_041718735.1"; db_xref "GeneID:100229797"; gbkey "CDS"; gene "ATG7"; product "ubiquitin-like modifier-activating enzyme ATG7 isoform X3"; protein_id "XP_041574669.1"; exon_number "3"; Parent "XM_041718735.1"; ID "CDS_388492"; extra_copy_number "0";
Chr11_NC_045647.1_20541412_21193702 Liftoff stop_codon 553256 553260 . - . gene_id "ATG7"; transcript_id "XM_041718735.1"; db_xref "GeneID:100229797"; gbkey "CDS"; gene "ATG7"; product "ubiquitin-like modifier-activating enzyme ATG7 isoform X3"; protein_id "XP_041574669.1"; exon_number "18"; Parent "XM_041718735.1"; ID "stop_codon_26423"; extra_copy_number "0";
The stop codon that I guess was 3bp long originally is now 5 bp long. That is not expected and mess up some internal logic in AGAT. I guess you should avoid to keep start and stop features when using liftoff.
What I suggest is to throw away all start and stop codon feature and use agat_sp_add_start_and_stop.pl
to re-introduce them if you really need them. They are useless if your plan is to do CDS translation.
Hi Jacques, Thank you for your insight! I will look over my code again and clarify if I made any errors writing it. I'll also keep an eye out for this issue in the future so that it can be avoided. I really appreciate your patience and your help!
Hi There!
I hope you are keeping well. I am currently using AGAT and the agat_sp_extract_sequences.pl to extract a CDS from a fasta file using the following line of code:
time agat_sp_extract_sequences.pl -g Hypocnemis_striata_Taen_Gutt_lifted.gtf -f All_Species_Folded_Genomes/Hypocnemis_striata_striata__JTW1312_folded.temp.fasta -t cds --remove_orf_offset -o All_Species_transcriptome/Hypocnemis_striata_striata__JTW1312.cds.fasta
I keep receiving the following error message which leads to the production blank .cds.fasta file:
------------- EXCEPTION ------------- MSG: '-2' is not a valid frame STACK Bio::SeqFeature::Generic::frame /home/ellen/miniconda3/envs/agat/lib/perl5/site_perl/Bio/SeqFeature/Generic.pm:506 STACK AGAT::OmniscientI::_check_cds /home/ellen/miniconda3/envs/agat/lib/perl5/site_perl/AGAT/OmniscientI.pm:2015 STACK AGAT::OmniscientI::slurp_gff3_file_JD /home/ellen/miniconda3/envs/agat/lib/perl5/site_perl/AGAT/OmniscientI.pm:489 STACK toplevel /home/ellen/miniconda3/envs/agat/bin/agat_sp_extract_sequences.pl:145 -------------------------------------
I have remade all the input files to ensure that there are no errors and have also run this same command on about 30 other sets of files produced from the same set of pipelines which all worked as intended and produced cds.faste files.
The error stems from this block of code in Generic.pm:
Title : frame Usage : my $frame = $feat->frame(); $feat->frame($frame); Function: get/set on frame information Returns : 0,1,2, '.' Args : none if get, the new value if set
=cut
sub frame { my $self = shift;
}
If you have any ideas how to address or interpret this issue, I would really appreciate it!
Thanks so much for your help!