Open HGuo-HKI opened 4 years ago
Hey @alexeigurevich can you please look into that issue ?
Hi all!
Thanks for reporting the issue. The problem is that the current version of MetaMiner can't accept genome sequences in regular .gbk
format (see below what I mean by "regular").
The currently accepted sequence formats are:
.fasta
format (a high-quality reference or a draft assembly).final.gbk
or .gbk
file (it contains specific tags like sec_met
and biosynthetic
which are essential for MetaMiner).annotated.txt
file.fasta
format (e.g. extracted from BOA, or antiSMASH, or other prediction tool output)see more details in the online documentation (the link is on the workflow configuration page).
We recommend option (1), in this case MetaMiner searches the entire genome for specific motifs related to various RiPP classes (cyanobactins, linardins, etc). In this search, MetaMiner tries all 6 possible translation frames to convert nucleotides into amino acids.
In your particular case, MetaMiner tried to interpret the .gbk
file as option (2) and since your .gbk
does not contain the specific AntiSMASH output tags, the workflow crashed.
The current workaround is to convert .gbk
into a FASTA file (or download the genome from NCBI in FASTA format directly). You can do the conversion online, e.g. here. There are two options -- Extract Individual Features (as Amino Acid sequences or Nucleotide Sequences) or Extract Whole Sequence (Nucleotides). If you choose the former and get amino acid FASTA, it will be interpreted by MetaMiner as option (4) and the processing will be very slow! Since all your amino acid sequences will be considered as potential RiPPs and thoroughly scored while only a small fraction of all CDS usually encode real RiPPs. If you choose the latter (Extract Whole Sequence) you will end up with MetaMiner option (1) which is the recommended way to run the tool.
I downloaded your GBK file and converted it using both ways. After that, I restarted the GNPS jobs as option (1) (nucleotide, full genome) and option (4) (protein, only features). The first job was completed very fast and found one relatively good match (albeit still not very trustable since it is just above the minimum quality threshold). The second job is still running for more than 7 hours.
My thoughts on this issue and future MetaMiner releases:
.gbk
files, parse them, and if they are not AntiSMASH output files then treat them as in option (1). I.e. we will do the conversion from GBK to nucleotide sequence automatically.Thank you for your kind help!
I am sorry that I have to report the failed result again, since I can make it following your suggestions: The sequence file is uploaded as .fasta derived from antiSMASH .gbk result after conversion online.
https://gnps.ucsd.edu/ProteoSAFe/status.jsp?task=af523f0c813e42ba801a242fc6c4af55 https://gnps.ucsd.edu/ProteoSAFe/status.jsp?task=38483845bcca4d76a05e8efcea14f609 https://gnps.ucsd.edu/ProteoSAFe/status.jsp?task=d59ca9169cad4c9182087f3520feb421
Thank you for your kind care in advance!
https://gnps.ucsd.edu/ProteoSAFe/status.jsp?task=65694aaf25964885930fd3e4145b4fbb
thank you very much!