Closed k6logc closed 8 months ago
Hi @k6logc,
I take it (based on a Google of the contig names) these are from the Human Oral Microbiome Database, so I can hopefully reproduce this with the exact contigs.
With error 2, did you specify a custom numeric locus tag? It is erroring in the string parsing step with locus tags.
With error 1, my guess is that there is no predicted gene as per prodigal on that contig, which errors out with a custom DB.
Regarding The entire folders for the contigs that error out are automatically deleted.
- are you running this with a workflow manager? If Pharokka errors out it shouldn't delete anything, so that is very strange.
I'll try and reproduce the bugs and get back to you.
George
Looking a bit deeper into error 1, I was wrong after reproducing it.
It's an error caused by the line, where there are 2 identical scored hits to different custom HMMs profiles in your custom database.
if best_results[result.protein].custom_hmm_id != hit.custom_hmm_id:
AttributeError: 'pyhmmer.plan7.Hit' object has no attribute 'custom_hmm_id'
A fix will be implemented in v1.6.
George
Hi @gbouras13,
Thank you so much for having a look and sorting out what's going on.
Great that you plan to implement a fix for error 1 in v1.6, thank you. Do you have a sense of approximately when v1.6 will be out?
For error 2 - no, we are not currently using custom numeric locus tags. You are right that the contigs are coming off HOMD.org - we are using the PROKKA versions (https://www.homd.org/ftp/genomes/PROKKA/V10.1/fna/) so the headers start prefixed with SEQFxxxxx.x, we pass them to geNomad, and currently we are taking the predicted phage regions into pharokka (as, for example, SEQF10001.1_JAAE01000196.1.fna, SEQF10002.1_LVER01000014.1.fna, or SEQF10001.1_JAAE01000004.1_provirus_6680_45243.fna) using a snakemake workflow (looping in @AmrutaIdagunji who is working on this with me), ideally we update to passing in gbks but we ran into some issues with this and will revisit.
Thank you!
Best, Kathryn
Hi @k6logc ,
Hopefully sometime in the next week v1.6 will be done, I'm just working through all the issues in the repository that have piled up since November. Your approach seems very reasonable to me!
The Snakemake wrapper would explain the deletion of the folders, it's a Snakemake thing - if something errors it will delete any generated files.
With error 2, pharokka is assuming the locus tag is a float (not a string). Some bad coding by me to make this ambiguous.
While I can't reproduce the error (I've tried a few different ways), I have put in a fix that should hopefully resolve it. George
Hi @gbouras13,
Awesome re: the v1.6 update, excited to update, good luck!
And thank you for rolling in a fix to address the error 2 issue and for mentioning about snakemake explaining the deletions.
Best wishes, Kathryn
Description
Dear George, Thank you for pharokka! We have it running overall well but we're getting some contigs dropping out and I can't tell what's going on. The entire folders for the contigs that error out are automatically deleted. There is nothing remarkable I can see about the dropped contigs in terms of sequence or name. Any guidance very much appreciated! Best, Kathryn
What I Did
Example 1 (contig JAAE01000196.1; yes, super short but there are others that are shorter): Errors out at customdb step (runs fine for other contigs):
Example 2 (contig LVER01000014.1): Errors out after CARD AMR Step (runs fine for other contigs):