**Databases and auxillary files can be found -> here <- or ->here<-**
**The file used for coverage should be parsed to the following format (Checkm Optional):
binname (tab) coverage (tab) checkm_lineage
Where binname is the name of the bin in the directory.**
Metabolic scanning and annotation of Metagenomes
Metascan is an Metagenomic Scanning and Annotation tool, with an emphasis on metabolic genes. The heart of Metascan is a set of metabolic core genes, that are used to paint a picture of the metabolic capacity of the sample. Furthermore, it utilizes the Kegg pathways for a complete metabolic overview of each sample.
Samples can be analyzed as eiter binned or unbinned metagenome.
Metascan consists of a perl script, a few auxillary (text-)files and a set of HMM profiles, created by clustering TrEmbl proteins, based on Kegg K-numbers.
Since it is a Prokka adaptation, it can therefor be run on any system that can already run Prokka, just by downloading the script and the databases.
he only modification that needs to be done is to direct the script to the right location of the database.
So, what can it do?
First and foremost, Metascan is intended to get an overview of the metabolic process within a given sample (metagenome). This could be an unbinned core assembly, but it can also be a binned metagenome. In which case you should not forget to include the unbinned leftover contigs as a bin in your analysis as these are still part of the metagenome.
Besides analysing the sample as a whole, Metascan can also be used to completely annotate genomes (with an emphasis on metabolic processes).
Thirdly, Metascan can be used to retrieve specific genes from a metagenome. Users can also submit their own (not necessarily metabolic genes) for Metascan to search and retrieve in fasta format.
Fourth (coming up in metascan2) is the option to search for viral contigs/areas within the metagenome (although not yet fully validated).
How does Metascan work?
Metascan works on a basis of ~180 different metabolic genes, that are key-genes or important genes in metabolic processes. McrA for instance, is needed for the conversion of methyl-CoM into methane. Without mcrA, there is no methanogenesis. Therefore, the amount of mcrA in a sample can be seen as a measure of the potential for Methanogenesis. The complete key-gene set is divided into 8 main subsets:
Each (most) of these subsets are composed of different processes and these processes are often formed by modules. Fore more information about the processes and modules I refer to the KEGG website, as the data and setup used in Metascan comes directly from KEGG. This makes it also easy to load your data into the KEGG website for further analysis and a visual reference. https://www.genome.jp/kegg/mapper.html
Can I add my own genes?
Yes, you can. Metascan has an option --hmms
that will able you to use your own HMM profile. You can simultaneously choose to use it with all the datasets or none at all. The last option makes Metascan convenient to look for specific genes of interest in (large meta-) genomes. You will however need a HMM profile to do so. If you already have one, or you found one online, then Bob’s your uncle.
If however, you have a new gene, or you want to update an old one, you’ll need to make one yourself. For this, the first thing you need to do is to gather all the relevant protein fastas of your gene of interest. Once you have those, you need to align them. The resulting alignment can then be used with hmmbuild to create a hmm profile. After indexing the HMM profile with hmmpress your profile is ready to go.
Please be aware that multiple HMMs need to be in one file. You can use the command cat
for this if you have multiple profiles.
When you are finished, add 1 line to the start of the HMM profile:
#CYCLE (tab) name
In order for Metascan to be able to output the generated data into the overview files
Dependencies:
Prokka:
Optional:
To be removed from Metascan:
To be added: -different e values for each cycle database