RKMlab / perf

PERF is an Exhaustive Repeat Finder
Other
33 stars 11 forks source link

--gene-key Error #3

Open Rohit-Satyam opened 2 years ago

Rohit-Satyam commented 2 years ago

Hi !!

I was trying to use the updated version of PERF and use the new feature for one of my bacterial strains. However, I am getting the following error

PERF -i ../raw/Tenacibaculum_discolor_gca_003664185.fa --format fasta -a -g ../raw/Tenacibaculum_discolor_gca_003664185.ASM366418v1.49.gff3 --anno-format GFF --gene-key ID

ERROR:

Processing Ga0183463_112: 100%|█████████████████████████████████████████████████████████| 12/12 [00:04<00:00,  2.97it/s]

GeneKeyError:
The attribute "gene_id" is not among the attributes for gene. Please select a different one.
The available ones are [Parent, Name, constitutive, ensembl_end_phase, ensembl_phase, exon_id, rank]

My GFF files contains the following flags in last column but changing it to ID or any other flag isn't working

ID=gene:C8N27_0080;biotype=protein_coding;description=cyclophilin family peptidyl-prolyl cis-trans isomerase;gene_id=C8N27_0080;logic_name=ena

When I use GTF file the error is

Using length cutoff of 12
Processing Ga0183463_112: 100%|█████████████████████████████████████████████████████████| 14/14 [00:03<00:00,  3.66it/s]
Traceback (most recent call last):
  File "/home/rohit/miniconda3/bin/PERF", line 8, in <module>
    sys.exit(main())
  File "/home/rohit/miniconda3/lib/python3.8/site-packages/PERF/core.py", line 162, in main
    ssr_native(args, length_cutoff=args.min_length)
  File "/home/rohit/miniconda3/lib/python3.8/site-packages/PERF/core.py", line 106, in ssr_native
    fasta_ssrs(args, repeats_info)
  File "/home/rohit/miniconda3/lib/python3.8/site-packages/PERF/rep_utils.py", line 253, in fasta_ssrs
    annotate(args)
  File "/home/rohit/miniconda3/lib/python3.8/site-packages/PERF/annotation.py", line 160, in annotate
    gffObject = process_annofile(anno_file, annotype, gene_id)
  File "/home/rohit/miniconda3/lib/python3.8/site-packages/PERF/annotation.py", line 112, in process_annofile
    attr_obj = process_attrs(attribute, annotype)
  File "/home/rohit/miniconda3/lib/python3.8/site-packages/PERF/annotation.py", line 66, in process_attrs
    attr_obj[attrName] = attr[1].strip()
IndexError: list index out of range

I am not sure what is being used in the background to process GFF/GTF files but my highest recommendation is to integrate PERF with AGAT which is an excellent tool for GTF/GFF file processing and handling.

avvaruakshay commented 2 years ago

Hi, Sorry you had to face the issue. I can see that you have mentioned the gene identifier as ID. Can you please check if any of the entries is missing an ID identifier? PERF uses a in house script for parsing GFF and GTF files and maybe facing an issue. Thank you for the suggestion on integrating AGAT with PERF. I'll surely look into it.

avvaruakshay commented 2 years ago

Hi, Based on you input files I have downloaded the genome and GFF of "Tenacibaculum_discolor" from NCBI and run PERF on it.

Command:

PERF -i GCF_003664185.1_ASM366418v1_genomic.fna.gz -g GCF_003664185.1_ASM366418v1_genomic.gff.gz --gene-key ID

Using length cutoff of 12
Processing NZ_RCCS01000003.1: 100%|██████████████████████| 12/12 [00:00<00:00, 18.09it/s]

Generating annotations for identified repeats..
100%|██████████████████████████████████| 2759/2759 [00:00<00:00, 32419.89it/s]

Output:

NZ_RCCS01000004.1   1021    1033    AAAATT  12  -   2   TTTAAT  gene-C8N27_RS00345427   1491    -   Genic   Promoter    -594
NZ_RCCS01000004.1   1452    1466    AACAC   14  -   2   TGTGT   gene-C8N27_RS00345427   1491    -   Genic   Promoter    -1025
NZ_RCCS01000004.1   2143    2155    AAAATG  12  -   2   CATTTT  gene-C8N27_RS003501668  4418    -   Genic   Promoter    -475
NZ_RCCS01000004.1   2301    2313    AAACG   12  -   2   TTCGT   gene-C8N27_RS003501668  4418    -   Genic   Promoter    -633

Seems to have not faced any issue. Can you please check your input file.