gbouras13 / pharokka

fast phage annotation program
MIT License
137 stars 13 forks source link

Database issues #346

Closed ZoseJapata closed 1 month ago

ZoseJapata commented 1 month ago

Description

Hello, I am trying to detect phages in .fna (FASTA) assembled sequence files. I was hoping to run Pharokka to accomplish this but ran into an issue which stated: "The database directory was unsuccessfully checked. Please run install_databases.py." 1) I then ran install_databases.py to try and resolve the issue but this did not work.

2) I then attempted to run install_databases.py and direct its installation into the folder which contained my .fna assembled sequence files which I believe was successful because there were files placed into it.

3) After this, I attempted to run Pharokka once more but received the same error which instructed me to run install_databses.py again.

-f was also used to overwrite the previous files that were created in the failed attempts.

I'm quite new to bioinformatics programs so if there are any recommendations for organizing the databases and getting Pharokka running would you be able help me? Thank you so much!

What I Did

1) install_databases.py -d

2) install_databases.py -o [/Shared/User/Cases/phage_identification]

3) pharokka.py -i AF9000-IA-VL00618-240524d_S146.fna -o AF9000Phages -t -f

The database directory was unsuccessfully checked. Please run install_databases.py

gbouras13 commented 1 month ago

Hi @ZoseJapata ,

You need to specify the database location from step 2 (if step 1 doesn't work as you say).

Try:

install_databases.py -o /Shared/User/Cases/phage_identification
pharokka.py -i AF9000-IA-VL00618-240524d_S146.fna -o AF9000Phages -t -f -d /Shared/User/Cases/phage_identification

George

mdtorohernando commented 1 month ago

Hi George! another having problems with the database! I've downloaded from the web with the wget option because the install_databases.py give to me an error:

(Pharokka) pgen@pgen:/media/pgen/Disco_2/curso_leon/2024$ install_databases.py 
2024-06-04 17:05:54.120 | INFO     | __main__:<module>:40 - --outdir was not specified.
2024-06-04 17:05:54.120 | INFO     | __main__:<module>:41 - Downloading databases to the default directory anyway.
2024-06-04 17:05:54.120 | INFO     | databases:instantiate_install:111 - Checking Pharokka database installation in /home/pgen/miniconda3/envs/Pharokka/bin/../databases/.
2024-06-04 17:05:54.121 | INFO     | databases:check_db_installation:213 - PHROGs Databases are missing.
2024-06-04 17:05:54.121 | INFO     | databases:check_db_installation:220 - VFDB Databases are missing.
2024-06-04 17:05:54.121 | INFO     | databases:check_db_installation:227 - CARD Databases are missing.
2024-06-04 17:05:54.121 | INFO     | databases:check_db_installation:233 - PHROGs Annotation File is missing.
2024-06-04 17:05:54.121 | INFO     | databases:check_db_installation:239 - INPHARED Mash Annotation File is missing.
2024-06-04 17:05:54.121 | INFO     | databases:check_db_installation:245 - INPHARED Mash Sketch File is missing.
2024-06-04 17:05:54.121 | INFO     | databases:instantiate_install:116 - Some Databases are missing.
2024-06-04 17:05:54.121 | INFO     | databases:instantiate_install:121 - Downloading Pharokka Databases from https://zenodo.org/record/8276347/files/pharokka_v1.4.0_databases.tar.gz.
2024-06-04 17:05:59.330 | ERROR    | databases:download:161 - ERROR: Could not download file from Zenodo! url=https://zenodo.org/record/8276347/files/pharokka_v1.4.0_databases.tar.gz, path=/home/pgen/miniconda3/envs/Pharokka/bin/../databases/pharokka_v1.4.0_databases.tar.gz

So I have downloaded in this way and all seems to be OK

wget "https://zenodo.org/record/8267900/files/pharokka_v1.4.0_databases.tar.gz"
tar -xzf pharokka_v1.4.0_databases.tar.gz

Next, I'm trying to run the Pharokka in this way: pharokka.py -i ULE_I140.hybrid.fasta -o prueba -d pharokka_v1.4.0_databases/ -t 36 -p prueba But... unfortunately...

2024-06-04 17:06:50.349 | INFO     | __main__:main:95 - Starting Pharokka v1.7.2
2024-06-04 17:06:50.349 | INFO     | __main__:main:96 - Command executed: Namespace(infile='ULE_I140.hybrid.fasta', outdir='prueba', database='pharokka_v1.4.0_databases/', threads='36', force=False, prefix='prueba', locustag='Default', gene_predictor='default', meta=False, split=False, coding_table='11', evalue='1E-05', fast=False, mmseqs2_only=False, meta_hmm=False, dnaapler=False, custom_hmm='', genbank=False, terminase=False, terminase_strand='nothing', terminase_start='nothing', skip_extra_annotations=False, skip_mash=False, minced_args='', mash_distance=0.2, citation=False)
2024-06-04 17:06:50.349 | INFO     | __main__:main:97 - Repository homepage is https://github.com/gbouras13/pharokka
2024-06-04 17:06:50.349 | INFO     | __main__:main:98 - Written by George Bouras: george.bouras@adelaide.edu.au
2024-06-04 17:06:50.349 | INFO     | __main__:main:100 - Checking database installation in pharokka_v1.4.0_databases/.
2024-06-04 17:06:50.350 | INFO     | databases:check_db_installation:220 - VFDB Databases are missing.
2024-06-04 17:06:50.350 | ERROR    | __main__:main:105 - The database directory was unsuccessfully checked. Please run install_databases.py.

Thanks! María

gbouras13 commented 1 month ago

Hi @mdtorohernando ,

I think the initial issue with install_databases.py is caused by zenodo playing up from time to time.

With the manual download, try

wget "https://zenodo.org/record/8276347/files/pharokka_v1.4.0_databases.tar.gz"
tar -xzf pharokka_v1.4.0_databases.tar.gz

Which should hopefully work instead.

George

mdtorohernando commented 1 month ago

Hi George, thanks for your quick response! Yes... indeed I did the manual download and all was OK, I have the folder with the Pharokka db v1.4.0... but when I run the Pharokka command, pointing to this folder... it says:

2024-06-04 17:06:50.350 | ERROR | __main__:main:105 - The database directory was unsuccessfully checked. Please run install_databases.py.

mdtorohernando commented 1 month ago

UPs more info! after running the command the log file indicates:

2024-06-04 17:13:34.407 | INFO     | __main__:main:95 - Starting Pharokka v1.7.2
2024-06-04 17:13:34.407 | INFO     | __main__:main:96 - Command executed: Namespace(infile='ULE_I140.hybrid.fasta', outdir='prueba', database='pharokka_v1.4.0_databases/', threads='36', force=True, prefix='prueba', locustag='Default', gene_predictor='default', meta=False, split=False, coding_table='11', evalue='1E-05', fast=False, mmseqs2_only=False, meta_hmm=False, dnaapler=False, custom_hmm='', genbank=False, terminase=False, terminase_strand='nothing', terminase_start='nothing', skip_extra_annotations=False, skip_mash=False, minced_args='', mash_distance=0.2, citation=False)
2024-06-04 17:13:34.407 | INFO     | __main__:main:97 - Repository homepage is https://github.com/gbouras13/pharokka
2024-06-04 17:13:34.407 | INFO     | __main__:main:98 - Written by George Bouras: george.bouras@adelaide.edu.au
2024-06-04 17:13:34.408 | INFO     | __main__:main:100 - Checking database installation in pharokka_v1.4.0_databases/.
2024-06-04 17:13:34.408 | INFO     | databases:check_db_installation:220 - VFDB Databases are missing.

But when I take a look into the pharokka db folder (named aspharokka_v1.4.0_databasesby default in the download) this is the files that are in. VFDB seems to be present...

image

gbouras13 commented 1 month ago

Sorry @mdtorohernando, the modification in my response was the URL to point towards version 2 of the database (not version 1 like the documentation had) - '8276347' not '8267900' in the URL. This will have the correct VFDB files (version 1 didn't!). I also fixed the documentation in the repository generally so thanks for this :)

Please redownload the database and try again - let me know how you go!

George

mdtorohernando commented 1 month ago

Thank you and sorry! I didn't realized about the change on the version. It is running now... (60min running)... so I hope all is OK... I let you know!

Thanks!