gbouras13 / pharokka

fast phage annotation program
MIT License
148 stars 16 forks source link

Expanding initial checks for user friendliness #293

Closed thauptfeld closed 1 year ago

thauptfeld commented 1 year ago

Description

When pharokka is run on a set of contigs, then having duplicate fasta ids is an issue that crashes the run. However, it only crashes towards the end of the run when the .gff file is converted to genbank. It should of course be on the user to make sure that the input fasta files don't contain duplicate IDs, but it would be more user friendly to have pharokka do a check for them in the beginning of the run and exit if it finds a duplicate ID. This way, the user loses a lot less time if there is a mistake in the file.

What I Did

pharokka.py -i phage_contigs_tiny.fasta -o output_dir -d ./pharokka_db/ -t 72 -m

This was in the log:

2023-09-14 09:04:16.829 | INFO     | post_processing:process_vfdb_results:2134 - Processing VFDB output.
2023-09-14 09:04:17.190 | INFO     | post_processing:process_vfdb_results:2197 - 15 VFDB virulence factors identified.
2023-09-14 09:04:19.505 | INFO     | post_processing:process_card_results:2241 - Processing CARD output.
2023-09-14 09:04:19.920 | INFO     | post_processing:process_card_results:2300 - 24 CARD AMR genes identified.
2023-09-14 09:55:39.322 | INFO     | __main__:main:404 - Converting gff to genbank.

Unfortunately, I don't have the error message from the stdout anymore, but it did say it crashed because of duplicate fasta IDs.

Great tool!