gbouras13 / phold

Phage Annotation using Protein Structures
MIT License
76 stars 4 forks source link

anti-CRISPR proteins get assigned the phrog cluster "acr" instead of a number #71

Open Fabian-Bastiaanssen opened 18 hours ago

Fabian-Bastiaanssen commented 18 hours ago

Description

I was attempting to run Phold on a GenBank file (gbk) to run Phynteny on it subsequently. However, the anti-CRISPR proteins were assigned "acr" as their phrog value instead of the phrog cluster they belong to. The same thing happens to vfdb hits and I imagine card hits as well.

What I Did

phold run -i pharokka.gbk -o phold/sequence_1/ -t 6 -d /phold/20240322 -f
 CDS             complement(33955..34404)
                 /ID="BUHCUXVW_CDS_0014"
                 /transl_table=11
                 /phrog="acr"
                 /locus_tag="BUHCUXVW_CDS_0014"
                 /function="moron, auxiliary metabolic gene and host
                 takeover"
                 /product="anti-CRISPR protein"
                 /source="Pyrodigal-gv_0.3.1"
                 /translation="MAKFIGVKMIEVVPMTAREANDKGHRIGNHSFEEDGYEVTYPNGY
                 KSWSPAKEFEKAYYKLEDPAGDVLKENDIKRFIKGIENVKVGTKTTNTTLTCLTGFEVH
                 GQAACVKPENFDLNVGSNYAQIKAEDKIWEGLGFVLQWAKYGLKK*"
gbouras13 commented 2 hours ago

Hi @Fabian-Bastiaanssen ,

This is by design in Phold - we want to represent the fact that this protein is not hitting a PHROG as its top hit but rather something in acrid. Same as CARD and VFDB. The issue is more on the Phynteny side as it will break as it currently stands for acrs (there is a fix for CARD and VFDB) - Susie is working on a big update at the moment, so it might be a while before she gets to this, but she knows about it - thanks for altering me.

George