Open luisalbertoc95 opened 11 months ago
Hi @luisalbertoc95 ,
Thanks for reporting this bug and using Pharokka! I see you're using Phables too :)
I'm pretty sure this has to do with the VFDB naming (it's annoying :) ).
Would you be able to do a few things:
--hmm_only
. It should work to get all the PHROG annotations, but it will skip CARD and VFDB steps. So do that if you're in a hurry.vfdb_results.tsv
. George.bouras@adelaide.edu.au (it should be small enough to email or attach here). I'm pretty sure it's because one of the VFDB outputs has a strange character and if so I will implement a fix soon once I can replicate the error.George
Hi George,
Thanks a lot for you suggestions. Running the code with --hmm_only worked! I'll send the vfdb_results.tsv to you.
Thank you,
Luis
Hi @luisalbertoc95 ,
It took a while but I solved this error - it was a bug in pharokka to do with matching VFDB and other outputs.
If you re-run pharokka now it should work (but seemingly you were happy enough with --hmm_only so maybe you've moved on)
George
Hello! I'm running pharokka 1.6.1 (fresh env and database install), and still receiving the same error (below). Running in --fast mode fixes the problem, so I think it seems like it has to do with the VFDB/CARD databases.
Pharokka version: 1.6.1 Python 3.10.8 OS: Linux, 3.10.0
Command: pharokka.py -i file.fna -f -o test.out -d /x/x/x/pharokka_db/ -t 32 -m -g prodigal --skip_mash
2024-01-22 20:59:20.921 | INFO | __main__:main:379 - Post Processing Output.
2024-01-22 20:59:23.455 | INFO | post_processing:create_mmseqs_tophits:2104 - Processing MMseqs2 outputs.
2024-01-22 20:59:23.455 | INFO | post_processing:create_mmseqs_tophits:2105 - Processing PHROGs output.
2024-01-22 20:59:30.113 | INFO | post_processing:process_vfdb_results:2309 - Processing VFDB output.
2024-01-22 20:59:30.149 | INFO | post_processing:process_vfdb_results:2368 - 17 VFDB virulence factors identified.
Traceback (most recent call last):
File "/home/ebueren/miniconda3/envs/pharokka1.6/bin/pharokka.py", line 499, in <module>
main()
File "/home/ebueren/miniconda3/envs/pharokka1.6/bin/pharokka.py", line 418, in main
pharok.process_results()
File "/home/ebueren/miniconda3/envs/pharokka1.6/bin/post_processing.py", line 356, in process_results
(merged_df, vfdb_results) = process_vfdb_results(
File "/home/ebueren/miniconda3/envs/pharokka1.6/bin/post_processing.py", line 2369, in process_vfdb_results
merged_df[["genbank", "desc_tmp", "vfdb_species"]] = merged_df[
File "/home/ebueren/miniconda3/envs/pharokka1.6/lib/python3.10/site-packages/pandas/core/frame.py", line 4287, in __setitem__
self._setitem_array(key, value)
File "/home/ebueren/miniconda3/envs/pharokka1.6/lib/python3.10/site-packages/pandas/core/frame.py", line 4329, in _setitem_array
check_key_length(self.columns, key, value)
File "/home/ebueren/miniconda3/envs/pharokka1.6/lib/python3.10/site-packages/pandas/core/indexers/utils.py", line 390, in check_key_length
raise ValueError("Columns must be same length as key")
ValueError: Columns must be same length as key
Hi, I am having this issue as well on a fresh mamba+pharokka (1.7.1) install.
pharokka.py -i vir.fa -o vir.prk -d ~/data/pharokka
Same error. Adding --hmm_only
or --fast
did not help. Happy to provide additional information that could help debug this!
Hi @fluhus ,
how big is your input? Is it very small? I have a feeling this error may be because MMseqs2 found no hits at all. I’ll try and replicate later this week and put in a fix if so.
george
Hi @fluhus,
I have narrowed down your error to the '#' in the header. If you remove this it will work. I'll put in a bug fix at some point :)
George
Thanks for looking into this! I removed the #
signs from the names and now it runs :)
Description
Hi @gbouras13, When trying to run pharokka_proteins.py in a set of 755001 ORFs I'm having an error due to a mismatch in lengths between the keys and columns in the pandas DataFrame. According to the log file, all mmseqs searches were completed.
Thank you!
What I Did
pharokka_proteins_1698789518.5425682.log