Closed nguyenjn1906 closed 1 year ago
Hi @nguyenjn1906 ,
File "/home/nguyenjn/.conda/envs/emapperinstall/lib/python3.7/site-packages/eggnogmapper/deco/decoration.py", line 91, in decorate_gff
g_score, g_strand, g_phase, g_attrs) = list(map(str.strip, line.split("\t")))
ValueError: not enough values to unpack (expected 9, got 1)
What is the format of your .gff file? Is it tab separated?
I believe it is tab separated. Here's a couple sample line from the .gff file.
mrdA_1 Prodigal:2.6 CDS 1 1893 . + 0 ID=HODOENEF_00001;eC_number=3.4.16.4;Name=mrdA_1;dbxref=COG:COG0768;gene=mrdA_1;inference=ab initio prediction:Prodigal:2.6,similar to AA sequence:UniProtKB:P0AD65;locus_tag=HODOENEF_00001;product=Peptidoglycan D%2CD-transpeptidase MrdA
cynR_1 Prodigal:2.6 CDS 1 960 . + 0 ID=HODOENEF_00002;Name=cynR_1;gene=cynR_1;inference=ab initio prediction:Prodigal:2.6,similar to AA sequence:UniProtKB:P27111;locus_tag=HODOENEF_00002;product=HTH-type transcriptional regulator CynR
proP_1 Prodigal:2.6 CDS 1 1302 . + 0 ID=HODOENEF_00003;Name=proP_1;gene=proP_1;inference=ab initio prediction:Prodigal:2.6,similar to AA sequence:UniProtKB:P0C0L7;locus_tag=HODOENEF_00003;product=Proline/betaine transporter
Hi @nguyenjn1906 ,
Just to try to be sure, could you try with awk?
cat GFF_FILE | awk -F $'\t' '{print NF}' | sort | uniq -c
Hello,
I have the same error with also the prokka gff output file.
The results of our awk command line :
cat annotation/predicted_genes/prokka/Sample_Name.gff | awk -F $'\t' '{print NF}' | sort | uniq -c
606887 1
33056 9
All the line with only one columns are the header columns (at the start of the file)
head annotation/predicted_genes/prokka/Sample_Name.gff
##gff-version 3
##sequence-region MGS_0 1 1720
##sequence-region MGS_1 1 3645
##sequence-region MGS_2 1 6843
##sequence-region MGS_3 1 5133
##sequence-region MGS_4 1 6541
##sequence-region MGS_5 1 8044
##sequence-region MGS_6 1 1667
##sequence-region MGS_7 1 2258
##sequence-region MGS_8 1 8431
and at the end of the file with the contigs sequences
tail annotation/predicted_genes/prokka/Sample_Name.gff
>MGS_2537
ACAGACTTGCCTTTCCCATTCTTCCCCACTAATACATTAACATCGTCCAAAACCCATTCA
ACATTATATTCATCGAAGAGGTTCTCTATACTTAATTTTTTTATTTTTACGCTCATTAAT
CTACCGACCTGATAATCCATTATGTTTGATGAGTAACACTGTAGCTTGATTCTCGCTTCA
...
CATCGTCCAAAACCCATTCAACATTATATTCATCGAAGAGGTTCTCTATACTTAATTTTT
TTATTTTTACGCTCATTAATC
I read your code and for me the problem is in this loop : https://github.com/eggnogdb/eggnog-mapper/blob/master/eggnogmapper/deco/decoration.py#L84. You can stop the read when you see the line "##FASTA". After it's the sequence of contigs
Regards,
Steven
Hi @steven-bioinfo ,
Thank you very much for the info and for providing a solution. I will include the fix in the master branch. Hopefully, other tools won't take as a trend the adding of fasta sequences within gff files.
Best, Carlos
Hi all, I am trying to run eggnog and add the eggnog annotation to an existing .gff output from prokka. When I run the command, it looks like the program cannot recognize the .gff file. How do I fix this?
Thanks,
Here's the code that I used to do that: emapper.py -i panaroo_6_strains_results/pan_genome_reference.fa --itype CDS --translate -o eggnog_result_6_strains_decorate_prokka --output_dir eggnog_pan_reference_6_strains_decorate --decorate_gff prokka_panaroo_apudapuas_results/prokka_panaroo_apudapuas_annotations.gff --cpu 10 --override
Here's the slurm output error:
ESC[1;32mFunctional annotation of hits...ESC[0m ESC[1;32mDecorating gff file prokka_panaroo_apudapuas_results/prokka_panaroo_apudapuas_annotations.gff...ESC[0m Traceback (most recent call last): File "/home/nguyenjn/.conda/envs/emapperinstall/bin/emapper.py", line 708, in
n, elapsed_time = emapper.run(args, args.input, args.annotate_hits_table, args.cache_file)
File "/home/nguyenjn/.conda/envs/emapperinstall/lib/python3.7/site-packages/eggnogmapper/emapper.py", line 351, in run
n, elapsed_time = self.run_generator(annotated_hits)
File "/home/nguyenjn/.conda/envs/emapperinstall/lib/python3.7/site-packages/eggnogmapper/emapper.py", line 288, in run_generator
for item in generator:
File "/home/nguyenjn/.conda/envs/emapperinstall/lib/python3.7/site-packages/eggnogmapper/deco/decoration.py", line 91, in decorate_gff
g_score, g_strand, g_phase, g_attrs) = list(map(str.strip, line.split("\t")))
ValueError: not enough values to unpack (expected 9, got 1)
emapper-2.1.10
emapper.py -i panaroo_6_strains_results/pan_genome_reference.fa --itype CDS --translate -o eggnog_result_6_strains_decorate_prokka --output_dir eggnog_pan_reference_6_strains_decorate --decorate_gff prokka_panaroo_apudapuas_results/prokka_panaroo_apudapuas_annotations.gff --cpu 10 --override
ESC[1;33m /home/nguyenjn/.conda/envs/emapperinstall/lib/python3.7/site-packages/eggnogmapper/bin/diamond blastp -d '/gpfs/accounts/epid582w23_class_root/epid582w23_class/shared_data/database/eggnog/eggnog_proteins.dmnd' -q '/home/nguyenjn/balunas_lab/emappertmp_dmdn_f5_96ftz/tmphkfre5_g' --threads 10 -o '/home/nguyenjn/balunas_lab/eggnog_pan_reference_6_strains_decorate/eggnog_result_6_strains_decorate_prokka.emapper.hits' --tmpdir '/home/nguyenjn/balunas_lab/emappertmp_dmdn_f5_96ftz' --sensitive --iterate -e 0.001 --top 3 --outfmt 6 qseqid sseqid pident length mismatch gapopen qstart qend sstart send evalue bitscore qcovhsp scovhspESC[0m slurm-50372823.out (END)