NAL-i5K / GFF3toolkit

Python programs for processing GFF3 files
Other
94 stars 27 forks source link

Error in gff3_fix #128

Open DiegoSafian opened 1 year ago

DiegoSafian commented 1 year ago

Hi, After successfully using gff3_QC, gff3_fix is giving me the following error:

(genometools) [safiand@login001 grass]$ gff3_fix -qc_r test.txt -g turneri_annotation.gff3 -og new_corrected.gff3
INFO     Checking QC report file (test.txt)...
INFO     Checking GFF3 file (turneri_annotation.gff3)...
INFO     Reading QC report file: (test.txt)...
INFO     Reading GFF3 file: (turneri_annotation.gff3)...
Traceback (most recent call last):
  File "/camp/home/safiand/home/users/safiand/.conda/envs/genometools/bin/gff3_fix", line 8, in <module>
    sys.exit(script_main())
  File "/camp/home/safiand/home/users/safiand/.conda/envs/genometools/lib/python3.10/site-packages/gff3tool/bin/gff3_fix.py", line 95, in script_main
    gff3_fix.fix.main(gff3=gff3, output_gff=args.output_gff, error_dict=error_dict, line_num_dict=line_num_dict, logger=logger_stderr)
  File "/camp/home/safiand/home/users/safiand/.conda/envs/genometools/lib/python3.10/site-packages/gff3tool/lib/gff3_fix/fix.py", line 692, in main
    split(gff3=gff3, error_list=error_dict[error_code], logger=logger)
  File "/camp/home/safiand/home/users/safiand/.conda/envs/genometools/lib/python3.10/site-packages/gff3tool/lib/gff3_fix/fix.py", line 165, in split
    childrenlist.append(c1['attributes']['ID'])
KeyError: 'ID'

So I tried the gff3_ID_generator.py, but this one also give me a similar message:

(genometools) [safiand@login001 grass]$ python gff3_ID_generator.py -g turneri_annotation.gff3 -og new.gff3
INFO     Reading input gff3 file: (turneri_annotation.gff3)
INFO     Generate new ID for features in (turneri_annotation.gff3)
Traceback (most recent call last):
  File "/camp/lab/cardoso-moreiam/home/users/safiand/genome_annotation/turneri/busco/turneri_rna_prot_multiples_species/grass/gff3_ID_generator.py", line 333, in <module>
    main(in_gff=args.gff, merge_report=args.merge_report, out_merge_report=args.out_merge_report, out_gff=args.output_gff, uuid_on=args.universally_unique_identifier, prefix=arg
s.idprefix, digitlen=args.digitlen, report=args.report, alias=args.alias)
  File "/camp/lab/cardoso-moreiam/home/users/safiand/genome_annotation/turneri/busco/turneri_rna_prot_multiples_species/grass/gff3_ID_generator.py", line 238, in main
    ID_dict[child['attributes']['ID']] = [newcID]
KeyError: 'ID'

What can I do to solve this problem? Am I doing something wrong?

My gff3 file look like this:

(genometools) [safiand@login001 grass]$ head turneri_annotation.gff3 -n 20
# gffread augustus.hints.gtf -o turnerifiltered.gff3 --merge -L -g GCA_922788865.1_HVK001PTURNERI_genomic.shortID.fna
# gffread v0.11.6
##gff-version 3
CAKLNU010000942.1       gffcl   locus   724     2835    .       +       .       ID=RLOC_00000001;transcripts=jg1.t1
CAKLNU010000942.1       AUGUSTUS        transcript      724     2835    .       +       .       ID=jg1.t1;geneID=jg1;locus=RLOC_00000001
CAKLNU010000942.1       AUGUSTUS        CDS     724     1083    .       +       0       Parent=jg1.t1
CAKLNU010000942.1       AUGUSTUS        CDS     1181    1625    0.34    +       0       Parent=jg1.t1
CAKLNU010000942.1       AUGUSTUS        CDS     2270    2835    0.42    +       2       Parent=jg1.t1
CAKLNU010000422.1       gffcl   locus   1528    9153    .       +       .       ID=RLOC_00000002;transcripts=jg2.t1
CAKLNU010000422.1       AUGUSTUS        transcript      1528    9153    .       +       .       ID=jg2.t1;geneID=jg2;locus=RLOC_00000002
CAKLNU010000422.1       AUGUSTUS        CDS     1528    1574    0.69    +       1       Parent=jg2.t1
CAKLNU010000422.1       AUGUSTUS        CDS     1718    1788    0.68    +       2       Parent=jg2.t1
CAKLNU010000422.1       AUGUSTUS        CDS     9010    9153    0.6     +       0       Parent=jg2.t1
CAKLNU010000746.1       gffcl   locus   834     3644    .       -       .       ID=RLOC_00000003;transcripts=jg3.t1
CAKLNU010000746.1       AUGUSTUS        transcript      834     3644    .       -       .       ID=jg3.t1;geneID=jg3;locus=RLOC_00000003
CAKLNU010000746.1       AUGUSTUS        CDS     834     878     0.96    -       2       Parent=jg3.t1
CAKLNU010000746.1       AUGUSTUS        CDS     988     1011    1       -       2       Parent=jg3.t1
CAKLNU010000746.1       AUGUSTUS        CDS     1310    1336    1       -       2       Parent=jg3.t1
CAKLNU010000746.1       AUGUSTUS        CDS     2483    2518    1       -       2       Parent=jg3.t1
CAKLNU010000746.1       AUGUSTUS        CDS     2597    2695    1       -       2       Parent=jg3.t1

Thanks!

mpoelchau commented 1 year ago

@DiegoSafian apologies, I completely missed this issue. Can you try removing the locus features from your gff3 file, to see if that is what the ID generator is erroring out on?