NBISweden / AGAT

Another Gtf/Gff Analysis Toolkit
GNU General Public License v3.0
465 stars 56 forks source link

MSG: '-2' is not a valid frame #396

Closed Ellenski closed 1 year ago

Ellenski commented 1 year ago

Hi There!

I hope you are keeping well. I am currently using AGAT and the agat_sp_extract_sequences.pl to extract a CDS from a fasta file using the following line of code:

time agat_sp_extract_sequences.pl -g Hypocnemis_striata_Taen_Gutt_lifted.gtf -f All_Species_Folded_Genomes/Hypocnemis_striata_striata__JTW1312_folded.temp.fasta -t cds --remove_orf_offset -o All_Species_transcriptome/Hypocnemis_striata_striata__JTW1312.cds.fasta

I keep receiving the following error message which leads to the production blank .cds.fasta file:

------------- EXCEPTION ------------- MSG: '-2' is not a valid frame STACK Bio::SeqFeature::Generic::frame /home/ellen/miniconda3/envs/agat/lib/perl5/site_perl/Bio/SeqFeature/Generic.pm:506 STACK AGAT::OmniscientI::_check_cds /home/ellen/miniconda3/envs/agat/lib/perl5/site_perl/AGAT/OmniscientI.pm:2015 STACK AGAT::OmniscientI::slurp_gff3_file_JD /home/ellen/miniconda3/envs/agat/lib/perl5/site_perl/AGAT/OmniscientI.pm:489 STACK toplevel /home/ellen/miniconda3/envs/agat/bin/agat_sp_extract_sequences.pl:145 -------------------------------------

I have remade all the input files to ensure that there are no errors and have also run this same command on about 30 other sets of files produced from the same set of pipelines which all worked as intended and produced cds.faste files.

The error stems from this block of code in Generic.pm:

Title : frame Usage : my $frame = $feat->frame(); $feat->frame($frame); Function: get/set on frame information Returns : 0,1,2, '.' Args : none if get, the new value if set

=cut

sub frame { my $self = shift;

if ( @_ ) {
    my $value = shift;
    if ( defined $value && 
        $value !~ /^[0-2.]$/ ) {
        $self->throw("'$value' is not a valid frame");
    }
    if( defined $value && $value eq '.' ) { $value = '.' } 
    return $self->{'_gsf_frame'} = $value;
}
return $self->{'_gsf_frame'};

}

If you have any ideas how to address or interpret this issue, I would really appreciate it!

Thanks so much for your help!

Juke34 commented 1 year ago

Specification do not allow -2 as frame in GTF and GFF. Only '0', '1', '2' or '.' are allowed. See https://agat.readthedocs.io/en/latest/gxf.html#main-points-and-differences-between-gff-formats So check your column 8 and replace any value out of this scope by . You can decide to remove all value from 8th column and replace by . If you really need this information you need your fasta file, AGAT can re-compute it for you via agat_sp_fix_cds_phases.pl

Ellenski commented 1 year ago

Hi Jacques! Thanks for the information. I checked my GTF when I got the error message and could not find any instances of "-2" in column 8 so it is very strange that I am getting the error. I will double check the file and try using the command you suggested. Let me know if you have any other thoughts.

Ellenski commented 1 year ago

Hi again. So, I'm still having trouble with the above error message. I went into my GTF file and confirmed that none of the entries in column 8 have a value of "-2". Then, just to be safe, I forced all column 8 entries to be "." . This resulted in the same error message as before when running _agat_sp_extractsequences.pl . I also looked into the _agat_sp_fix_cdsphases.pl command that you suggested, but running this produced the same error message (MSG: '-2' is not a valid frame) and a blank file. This makes me think that whatever the error is, it is not stemming from column 8 in the GTF and as mentioned before, none of my other GTF files that were created using the same pipeline as the one I am currently working with hav encountered this issue. Do you have any other suggestions of what I might try. Thank you so much for your help.

Juke34 commented 1 year ago

Then it might be indeed an error introduced by agat. Could you share the file that gives this error?

Ellenski commented 1 year ago

Sure thing! I emailed you the files.

Juke34 commented 1 year ago

Your case is very peculiar. First (not directly related to the problem) I don't get why you use --remove_orf_offset for the extraction. If you have a functional ORF you are pretty sure it would break it (It remove the offset from each piece of the CDS). Try one gene using the --split parameter for better understanding (once with --remove_orf_offset and one wihtout).

Second, did you use -polish when using liftoff? Because the problematic case is the following:

Chr11_NC_045647.1_20541412_21193702 Liftoff transcript  553066  579467  .   -   .   gene_id "ATG7"; transcript_id "XM_041718735.1"; db_xref "GeneID:100229797"; gbkey "mRNA"; gene "ATG7"; model_evidence "Supporting evidence includes similarity to: 3 ESTs, 6 Proteins, and 100% coverage of the annotated genomic feature by RNAseq alignments, including 13 samples with support for all annotated introns"; product "autophagy related 7, transcript variant X4"; transcript_biotype "mRNA"; Parent "ATG7"; matches_ref_protein "False"; valid_ORF "False"; missing_start_codon "True"; extra_copy_number "0"; 
Chr11_NC_045647.1_20541412_21193702 Liftoff exon    553066  553260  .   -   .   gene_id "ATG7"; transcript_id "XM_041718735.1"; db_xref "GeneID:100229797"; gene "ATG7"; model_evidence "Supporting evidence includes similarity to: 3 ESTs, 6 Proteins, and 100% coverage of the annotated genomic feature by RNAseq alignments, including 13 samples with support for all annotated introns"; product "autophagy related 7, transcript variant X4"; transcript_biotype "mRNA"; exon_number "18"; Parent "XM_041718735.1"; ID "exon_437101"; extra_copy_number "0"; 
Chr11_NC_045647.1_20541412_21193702 Liftoff exon    553751  553873  .   -   .   gene_id "ATG7"; transcript_id "XM_041718735.1"; db_xref "GeneID:100229797"; gene "ATG7"; model_evidence "Supporting evidence includes similarity to: 3 ESTs, 6 Proteins, and 100% coverage of the annotated genomic feature by RNAseq alignments, including 13 samples with support for all annotated introns"; product "autophagy related 7, transcript variant X4"; transcript_biotype "mRNA"; exon_number "17"; Parent "XM_041718735.1"; ID "exon_437100"; extra_copy_number "0"; 
Chr11_NC_045647.1_20541412_21193702 Liftoff exon    559329  559409  .   -   .   gene_id "ATG7"; transcript_id "XM_041718735.1"; db_xref "GeneID:100229797"; gene "ATG7"; model_evidence "Supporting evidence includes similarity to: 3 ESTs, 6 Proteins, and 100% coverage of the annotated genomic feature by RNAseq alignments, including 13 samples with support for all annotated introns"; product "autophagy related 7, transcript variant X4"; transcript_biotype "mRNA"; exon_number "16"; Parent "XM_041718735.1"; ID "exon_437099"; extra_copy_number "0"; 
Chr11_NC_045647.1_20541412_21193702 Liftoff exon    562384  562459  .   -   .   gene_id "ATG7"; transcript_id "XM_041718735.1"; db_xref "GeneID:100229797"; gene "ATG7"; model_evidence "Supporting evidence includes similarity to: 3 ESTs, 6 Proteins, and 100% coverage of the annotated genomic feature by RNAseq alignments, including 13 samples with support for all annotated introns"; product "autophagy related 7, transcript variant X4"; transcript_biotype "mRNA"; exon_number "15"; Parent "XM_041718735.1"; ID "exon_437098"; extra_copy_number "0"; 
Chr11_NC_045647.1_20541412_21193702 Liftoff exon    563153  563268  .   -   .   gene_id "ATG7"; transcript_id "XM_041718735.1"; db_xref "GeneID:100229797"; gene "ATG7"; model_evidence "Supporting evidence includes similarity to: 3 ESTs, 6 Proteins, and 100% coverage of the annotated genomic feature by RNAseq alignments, including 13 samples with support for all annotated introns"; product "autophagy related 7, transcript variant X4"; transcript_biotype "mRNA"; exon_number "14"; Parent "XM_041718735.1"; ID "exon_437097"; extra_copy_number "0"; 
Chr11_NC_045647.1_20541412_21193702 Liftoff exon    563522  563725  .   -   .   gene_id "ATG7"; transcript_id "XM_041718735.1"; db_xref "GeneID:100229797"; gene "ATG7"; model_evidence "Supporting evidence includes similarity to: 3 ESTs, 6 Proteins, and 100% coverage of the annotated genomic feature by RNAseq alignments, including 13 samples with support for all annotated introns"; product "autophagy related 7, transcript variant X4"; transcript_biotype "mRNA"; exon_number "13"; Parent "XM_041718735.1"; ID "exon_437096"; extra_copy_number "0"; 
Chr11_NC_045647.1_20541412_21193702 Liftoff exon    571157  571315  .   -   .   gene_id "ATG7"; transcript_id "XM_041718735.1"; db_xref "GeneID:100229797"; gene "ATG7"; model_evidence "Supporting evidence includes similarity to: 3 ESTs, 6 Proteins, and 100% coverage of the annotated genomic feature by RNAseq alignments, including 13 samples with support for all annotated introns"; product "autophagy related 7, transcript variant X4"; transcript_biotype "mRNA"; exon_number "11"; Parent "XM_041718735.1"; ID "exon_437094"; extra_copy_number "0"; 
Chr11_NC_045647.1_20541412_21193702 Liftoff exon    571926  572070  .   -   .   gene_id "ATG7"; transcript_id "XM_041718735.1"; db_xref "GeneID:100229797"; gene "ATG7"; model_evidence "Supporting evidence includes similarity to: 3 ESTs, 6 Proteins, and 100% coverage of the annotated genomic feature by RNAseq alignments, including 13 samples with support for all annotated introns"; product "autophagy related 7, transcript variant X4"; transcript_biotype "mRNA"; exon_number "10"; Parent "XM_041718735.1"; ID "exon_437093"; extra_copy_number "0"; 
Chr11_NC_045647.1_20541412_21193702 Liftoff exon    573818  573908  .   -   .   gene_id "ATG7"; transcript_id "XM_041718735.1"; db_xref "GeneID:100229797"; gene "ATG7"; model_evidence "Supporting evidence includes similarity to: 3 ESTs, 6 Proteins, and 100% coverage of the annotated genomic feature by RNAseq alignments, including 13 samples with support for all annotated introns"; product "autophagy related 7, transcript variant X4"; transcript_biotype "mRNA"; exon_number "9"; Parent "XM_041718735.1"; ID "exon_437092"; extra_copy_number "0"; 
Chr11_NC_045647.1_20541412_21193702 Liftoff exon    574872  574993  .   -   .   gene_id "ATG7"; transcript_id "XM_041718735.1"; db_xref "GeneID:100229797"; gene "ATG7"; model_evidence "Supporting evidence includes similarity to: 3 ESTs, 6 Proteins, and 100% coverage of the annotated genomic feature by RNAseq alignments, including 13 samples with support for all annotated introns"; product "autophagy related 7, transcript variant X4"; transcript_biotype "mRNA"; exon_number "8"; Parent "XM_041718735.1"; ID "exon_437091"; extra_copy_number "0"; 
Chr11_NC_045647.1_20541412_21193702 Liftoff exon    576044  576132  .   -   .   gene_id "ATG7"; transcript_id "XM_041718735.1"; db_xref "GeneID:100229797"; gene "ATG7"; model_evidence "Supporting evidence includes similarity to: 3 ESTs, 6 Proteins, and 100% coverage of the annotated genomic feature by RNAseq alignments, including 13 samples with support for all annotated introns"; product "autophagy related 7, transcript variant X4"; transcript_biotype "mRNA"; exon_number "7"; Parent "XM_041718735.1"; ID "exon_437090"; extra_copy_number "0"; 
Chr11_NC_045647.1_20541412_21193702 Liftoff exon    576698  576847  .   -   .   gene_id "ATG7"; transcript_id "XM_041718735.1"; db_xref "GeneID:100229797"; gene "ATG7"; model_evidence "Supporting evidence includes similarity to: 3 ESTs, 6 Proteins, and 100% coverage of the annotated genomic feature by RNAseq alignments, including 13 samples with support for all annotated introns"; product "autophagy related 7, transcript variant X4"; transcript_biotype "mRNA"; exon_number "6"; Parent "XM_041718735.1"; ID "exon_437089"; extra_copy_number "0"; 
Chr11_NC_045647.1_20541412_21193702 Liftoff exon    577804  577920  .   -   .   gene_id "ATG7"; transcript_id "XM_041718735.1"; db_xref "GeneID:100229797"; gene "ATG7"; model_evidence "Supporting evidence includes similarity to: 3 ESTs, 6 Proteins, and 100% coverage of the annotated genomic feature by RNAseq alignments, including 13 samples with support for all annotated introns"; product "autophagy related 7, transcript variant X4"; transcript_biotype "mRNA"; exon_number "5"; Parent "XM_041718735.1"; ID "exon_437088"; extra_copy_number "0"; 
Chr11_NC_045647.1_20541412_21193702 Liftoff exon    578482  578559  .   -   .   gene_id "ATG7"; transcript_id "XM_041718735.1"; db_xref "GeneID:100229797"; gene "ATG7"; model_evidence "Supporting evidence includes similarity to: 3 ESTs, 6 Proteins, and 100% coverage of the annotated genomic feature by RNAseq alignments, including 13 samples with support for all annotated introns"; product "autophagy related 7, transcript variant X4"; transcript_biotype "mRNA"; exon_number "4"; Parent "XM_041718735.1"; ID "exon_437087"; extra_copy_number "0"; 
Chr11_NC_045647.1_20541412_21193702 Liftoff exon    579350  579467  .   -   .   gene_id "ATG7"; transcript_id "XM_041718735.1"; db_xref "GeneID:100229797"; gene "ATG7"; model_evidence "Supporting evidence includes similarity to: 3 ESTs, 6 Proteins, and 100% coverage of the annotated genomic feature by RNAseq alignments, including 13 samples with support for all annotated introns"; product "autophagy related 7, transcript variant X4"; transcript_biotype "mRNA"; exon_number "3"; Parent "XM_041718735.1"; ID "exon_437086"; extra_copy_number "0"; 
Chr11_NC_045647.1_20541412_21193702 Liftoff CDS 553751  553873  .   -   .   gene_id "ATG7"; transcript_id "XM_041718735.1"; db_xref "GeneID:100229797"; gbkey "CDS"; gene "ATG7"; product "ubiquitin-like modifier-activating enzyme ATG7 isoform X3"; protein_id "XP_041574669.1"; exon_number "17"; Parent "XM_041718735.1"; ID "CDS_388506"; extra_copy_number "0"; 
Chr11_NC_045647.1_20541412_21193702 Liftoff CDS 559329  559409  .   -   .   gene_id "ATG7"; transcript_id "XM_041718735.1"; db_xref "GeneID:100229797"; gbkey "CDS"; gene "ATG7"; product "ubiquitin-like modifier-activating enzyme ATG7 isoform X3"; protein_id "XP_041574669.1"; exon_number "16"; Parent "XM_041718735.1"; ID "CDS_388505"; extra_copy_number "0"; 
Chr11_NC_045647.1_20541412_21193702 Liftoff CDS 562384  562459  .   -   .   gene_id "ATG7"; transcript_id "XM_041718735.1"; db_xref "GeneID:100229797"; gbkey "CDS"; gene "ATG7"; product "ubiquitin-like modifier-activating enzyme ATG7 isoform X3"; protein_id "XP_041574669.1"; exon_number "15"; Parent "XM_041718735.1"; ID "CDS_388504"; extra_copy_number "0"; 
Chr11_NC_045647.1_20541412_21193702 Liftoff CDS 563153  563268  .   -   .   gene_id "ATG7"; transcript_id "XM_041718735.1"; db_xref "GeneID:100229797"; gbkey "CDS"; gene "ATG7"; product "ubiquitin-like modifier-activating enzyme ATG7 isoform X3"; protein_id "XP_041574669.1"; exon_number "14"; Parent "XM_041718735.1"; ID "CDS_388503"; extra_copy_number "0"; 
Chr11_NC_045647.1_20541412_21193702 Liftoff CDS 563522  563725  .   -   .   gene_id "ATG7"; transcript_id "XM_041718735.1"; db_xref "GeneID:100229797"; gbkey "CDS"; gene "ATG7"; product "ubiquitin-like modifier-activating enzyme ATG7 isoform X3"; protein_id "XP_041574669.1"; exon_number "13"; Parent "XM_041718735.1"; ID "CDS_388502"; extra_copy_number "0"; 
Chr11_NC_045647.1_20541412_21193702 Liftoff CDS 571157  571315  .   -   .   gene_id "ATG7"; transcript_id "XM_041718735.1"; db_xref "GeneID:100229797"; gbkey "CDS"; gene "ATG7"; product "ubiquitin-like modifier-activating enzyme ATG7 isoform X3"; protein_id "XP_041574669.1"; exon_number "11"; Parent "XM_041718735.1"; ID "CDS_388500"; extra_copy_number "0"; 
Chr11_NC_045647.1_20541412_21193702 Liftoff CDS 571926  572070  .   -   .   gene_id "ATG7"; transcript_id "XM_041718735.1"; db_xref "GeneID:100229797"; gbkey "CDS"; gene "ATG7"; product "ubiquitin-like modifier-activating enzyme ATG7 isoform X3"; protein_id "XP_041574669.1"; exon_number "10"; Parent "XM_041718735.1"; ID "CDS_388499"; extra_copy_number "0"; 
Chr11_NC_045647.1_20541412_21193702 Liftoff CDS 573818  573908  .   -   .   gene_id "ATG7"; transcript_id "XM_041718735.1"; db_xref "GeneID:100229797"; gbkey "CDS"; gene "ATG7"; product "ubiquitin-like modifier-activating enzyme ATG7 isoform X3"; protein_id "XP_041574669.1"; exon_number "9"; Parent "XM_041718735.1"; ID "CDS_388498"; extra_copy_number "0"; 
Chr11_NC_045647.1_20541412_21193702 Liftoff CDS 574872  574993  .   -   .   gene_id "ATG7"; transcript_id "XM_041718735.1"; db_xref "GeneID:100229797"; gbkey "CDS"; gene "ATG7"; product "ubiquitin-like modifier-activating enzyme ATG7 isoform X3"; protein_id "XP_041574669.1"; exon_number "8"; Parent "XM_041718735.1"; ID "CDS_388497"; extra_copy_number "0"; 
Chr11_NC_045647.1_20541412_21193702 Liftoff CDS 576044  576132  .   -   .   gene_id "ATG7"; transcript_id "XM_041718735.1"; db_xref "GeneID:100229797"; gbkey "CDS"; gene "ATG7"; product "ubiquitin-like modifier-activating enzyme ATG7 isoform X3"; protein_id "XP_041574669.1"; exon_number "7"; Parent "XM_041718735.1"; ID "CDS_388496"; extra_copy_number "0"; 
Chr11_NC_045647.1_20541412_21193702 Liftoff CDS 576698  576847  .   -   .   gene_id "ATG7"; transcript_id "XM_041718735.1"; db_xref "GeneID:100229797"; gbkey "CDS"; gene "ATG7"; product "ubiquitin-like modifier-activating enzyme ATG7 isoform X3"; protein_id "XP_041574669.1"; exon_number "6"; Parent "XM_041718735.1"; ID "CDS_388495"; extra_copy_number "0"; 
Chr11_NC_045647.1_20541412_21193702 Liftoff CDS 577804  577920  .   -   .   gene_id "ATG7"; transcript_id "XM_041718735.1"; db_xref "GeneID:100229797"; gbkey "CDS"; gene "ATG7"; product "ubiquitin-like modifier-activating enzyme ATG7 isoform X3"; protein_id "XP_041574669.1"; exon_number "5"; Parent "XM_041718735.1"; ID "CDS_388494"; extra_copy_number "0"; 
Chr11_NC_045647.1_20541412_21193702 Liftoff CDS 578482  578559  .   -   .   gene_id "ATG7"; transcript_id "XM_041718735.1"; db_xref "GeneID:100229797"; gbkey "CDS"; gene "ATG7"; product "ubiquitin-like modifier-activating enzyme ATG7 isoform X3"; protein_id "XP_041574669.1"; exon_number "4"; Parent "XM_041718735.1"; ID "CDS_388493"; extra_copy_number "0"; 
Chr11_NC_045647.1_20541412_21193702 Liftoff CDS 579350  579467  .   -   .   gene_id "ATG7"; transcript_id "XM_041718735.1"; db_xref "GeneID:100229797"; gbkey "CDS"; gene "ATG7"; product "ubiquitin-like modifier-activating enzyme ATG7 isoform X3"; protein_id "XP_041574669.1"; exon_number "3"; Parent "XM_041718735.1"; ID "CDS_388492"; extra_copy_number "0"; 
Chr11_NC_045647.1_20541412_21193702 Liftoff stop_codon  553256  553260  .   -   .   gene_id "ATG7"; transcript_id "XM_041718735.1"; db_xref "GeneID:100229797"; gbkey "CDS"; gene "ATG7"; product "ubiquitin-like modifier-activating enzyme ATG7 isoform X3"; protein_id "XP_041574669.1"; exon_number "18"; Parent "XM_041718735.1"; ID "stop_codon_26423"; extra_copy_number "0"; 

The stop codon that I guess was 3bp long originally is now 5 bp long. That is not expected and mess up some internal logic in AGAT. I guess you should avoid to keep start and stop features when using liftoff.

What I suggest is to throw away all start and stop codon feature and use agat_sp_add_start_and_stop.pl to re-introduce them if you really need them. They are useless if your plan is to do CDS translation.

Ellenski commented 1 year ago

Hi Jacques, Thank you for your insight! I will look over my code again and clarify if I made any errors writing it. I'll also keep an eye out for this issue in the future so that it can be avoided. I really appreciate your patience and your help!