Finn-Lab / SanntiS

SMBGC Annotation using Neural Networks Trained on Interpro Signatures
Apache License 2.0
19 stars 1 forks source link

Missing gene information #4

Closed lisizzz closed 10 months ago

lisizzz commented 10 months ago

Hi,

I'm a bit confused. Is SanntiS only designed to predict BGCs themselves, without predicting the genes within these clusters? Because the output files appear to only contain information about the predicted BGCs, without any gene-related information. Many other BGC detection tools, such as AntiSMASH, DeepBGC, and GECCO, typically include one or more GenBank files that contain information about the genes within the predicted BGCs. Is there a way to get the gene information using SanntiS?

SantiagoSanchezF commented 10 months ago

Hello,

Thank you for reaching out. Let try to clarify this

SanntiS does indeed predict BGCs and also provides information about the genes within these clusters. The main outputs you can expect from SanntiS are:

A GFF file which contains the coordinates of the predicted BGCs. This file includes details like the most probable class of each BGC and the most similar known BGC from the MIBiG database, which can give insights into the potential function of the predicted BGC. Alongside the GFF file, SanntiS generates a file with a .prodigal.faa extension. This file contains the predicted genes within the BGCs and the sequences of proteins that these genes are expected to encode. So, while the GFF file gives you the location and possible function of BGCs, the .prodigal.faa file gives you the specifics on the genes and proteins that are part of these clusters. Additionally, a .prodigal.faa.ip.tsv file is produced by running the protein sequences through InterProScan, which provides functional annotations of the genes predicted by SanntiS. This means that not only do you get to know what genes are present, but you also get information on the potential functions of these genes based on known protein domains and features. So, in summary, SanntiS does cover both BGC prediction and gene/protein prediction within these clusters. If you're looking for gene-related information, you'll find it in the .prodigal.faa and .prodigal.faa.ip.tsv files.

I hope that helps. Let me know if you have any other question. Santiago

lisizzz commented 10 months ago

Thank you for the explanation, it helped a lot.