How to run Metascan - Githubissues

saras224 commented 1 year ago

Hi @gcremers ! I tried to install your tool metascan to annotate my Metagenome Asembled Genomes but I do not know where should I place all my genomes. And does it takes multiple genomes as input? and what kind of output it will produce for multiple genomes? All genomes will have their own annotated file as output or it will produce a merged output for all the MAGs provided as input?

Kindly help me navigate through annotation of MAGs through your tool. Please provide a short command usage to me.

Thanks in Advance!!!

gcremers commented 1 year ago

Hi saras,

All MAGs go into 1 folder. From there you can run metascan, for instance with:

metascan .

(when run from within that same folder. So the input is technically a folder). It will then run default metascan.

metascan . --nokegg (will not use the full kegg metabolic database. This is faster, but/as it will only annotate the key genes in the genomes).

metascan . --norrna (This is alot faster, as it will skip the step to annotate the 16S/23S genes with BLAST). When you used a pipeline to bin your MAGS you most likely have phylogenetic information of the bins anyway.

metascan . --depth coverage_filename speaking of which. If you have file with the names of the MAGS and the coverage of those MAGS, you can use this command to feed it to metascan. It will then incorporate the coverage into the output. The file needs to be MAG-name <tab> coverage. As explained on the main page

metascan . --prokka Whenever I use metascan for a complete annotation (including everything), I use that one. This will take the longest. So I usually use this for a single genome.

more options are available with:metascan . --help

Depending on how in-depth you want to go and how long you want to spend on it, you can combine these options. metascan . --nokegg --norrna --depth coverage_filename will be fast, but will mainly give an overview of the sample as awhole

metascan . --prokka --depth coverage_filename will result in a full annotation, but will take more time.

The output will be most of the above. You get annotations for each separate MAG, but you will also get overview files of the MAGs as a whole.

saras224 commented 1 year ago

Thanks @gcremers! for your prompt response but I am not able to run metascan with this type of command. I can only run metascan.pl then give the name of the whole directory where the MAGs are present. But now I am encountering this error:

SLURM_JOBID=210158 SLURM_JOB_NODELIST=node8 SLURM_NNODES=1 SLURMTMPDIR= Date = Mon May 22 19:49:04 IST 2023 Hostname = node8

Number of Nodes Allocated = 1 Number of Tasks Allocated = 20 Number of Cores/Task Allocated = Working Directory = /home/saraswati/metascan working directory = /home/saraswati/metascan [19:49:04] Looking for 'barrnap' - found /home/saraswati/conda/anaconda3/envs/prokka/bin/barrnap [19:49:04] Determined barrnap version is 0.9 [19:49:04] Looking for 'blastn' - found /home/saraswati/conda/anaconda3/envs/prokka/bin/blastn [19:49:06] Determined blastn version is 2.14 [19:49:06] Looking for 'egrep' - found /usr/bin/egrep [19:49:06] Looking for 'find' - found /usr/bin/find [19:49:06] Looking for 'grep' - found /usr/bin/grep [19:49:06] Looking for 'hmmpress' - found /home/saraswati/conda/anaconda3/envs/prokka/bin/hmmpress [19:49:06] Determined hmmpress version is 3.3 [19:49:06] Looking for 'hmmsearch' - found /home/saraswati/conda/anaconda3/envs/prokka/bin/hmmsearch [19:49:06] Determined hmmsearch version is 3.3 [19:49:06] Looking for 'less' - found /usr/bin/less [19:49:06] Looking for 'parallel' - found /home/saraswati/conda/anaconda3/envs/prokka/bin/parallel [19:49:07] Determined parallel version is 20230322 [19:49:07] Looking for 'prodigal' - found /home/saraswati/conda/anaconda3/envs/prokka/bin/prodigal [19:49:07] Determined prodigal version is 2.6 [19:49:07] Looking for 'sed' - found /usr/bin/sed [19:51:12] Metascan found 9 fasta files to analyse [19:51:12] .*.....**METASCAN***...*... [19:51:12] This is metascan.pl 1.2 [19:51:12] Written by G. Cremers [19:51:12] Homepage is gitlab.science.ru.nl/gcremers/metascan [19:51:12] Local time is Mon May 22 19:49:04 2023 [19:51:12] You are saraswati [19:51:12] Operating system is linux [19:51:12] Command: ./metascan.pl /home/saraswati/DRAM_TRIAL_ANNOTATION/ [19:51:12] Output E-value setting for HMM: Without Kegg: 1e-100 With Kegg: 1e-50 [19:51:12] E-value cut off for RNA and small proteins: 1e-06 [19:51:12] Cut-off value for a protein to be considered small: 200 [19:51:12] Size range Query-Target length: 20 % [19:51:12] Size range Query-Target length partials: 30 % [19:51:12] Working directory: /home/saraswati/metascan [19:51:12] Generating locus_tag from '/home/saraswati/DRAM_TRIAL_ANNOTATION//A5_bin.15.fa' contents. [19:51:12] Setting --locustag NIIKEOPD from MD5 7224e89d0bbffd476427b828dc5a6458 [19:51:12] Creating new output folder: /home/saraswati/DRAM_TRIAL_ANNOTATION//NIIKEOPD [19:51:12] Running: mkdir -p \/home\/saraswati\/DRAM_TRIAL_ANNOTATION\/\/NIIKEOPD Preparing setup data from files Kegg files Hydrogen files [19:51:12] Creating new output folder: /home/saraswati/DRAM_TRIAL_ANNOTATION//NIIKEOPD/hydrogenases [19:51:12] Running: mkdir -p \/home\/saraswati\/DRAM_TRIAL_ANNOTATION\/\/NIIKEOPD\/hydrogenases [19:51:13] Using filename prefix: NIIKEOPD.XXX [19:51:13] Writing log to: /home/saraswati/DRAM_TRIAL_ANNOTATION//NIIKEOPD/NIIKEOPD.log [19:51:13] Loading and checking input file: /home/saraswati/DRAM_TRIAL_ANNOTATION//A5_bin.15.fa [19:51:13] Wrote 259 contigs totalling 2254079 bp. [19:51:13] Setting HMMER_NCPU=1 [19:51:13] Using genetic code table 11. [19:51:13] You have BioPerl 1.7.8 Argument "1.7.8" isn't numeric in numeric lt (<) at ./metascan.pl line 651. [19:51:13] System has 40 cores. [19:51:13] Will use maximum of 8 cores. [19:51:13] Annotating as >>> Bacteria <<< [19:51:13] Appending to PATH: /home/saraswati/metascan [19:51:13] Predicting Ribosomal RNAs [19:51:13] Running Barrnap with 8 threads [19:51:14] Found 0 rRNAs [19:51:14] Total of 0 tRNA + rRNA features [19:51:14] Predicting coding sequences [19:51:14] Contigs total 2254079 bp, so using single mode [19:51:14] Running: prodigal -i \/home\/saraswati\/DRAM_TRIAL_ANNOTATION\/\/NIIKEOPD\/NIIKEOPD.fna -m -g 11 -p single -f sco -q [19:51:19] Found 2356 CDS [19:51:19] Connecting features back to sequences [19:51:19] Preparing HMMER annotation source [19:51:19] Your HMM is not indexed, please run: hmmpress /lustre/saraswati/metascan_db//nitro.cycle.sub.hmm #*Make sure to use the full path

Last error is showing that full path is not provided although that is the absolute path only..

Please let me know what can be done.. Thanks! Saras :)

gcremers commented 1 year ago

You still need to index the hmm files with hmmpress

see https://manpages.ubuntu.com/manpages/bionic/man1/hmmpress.1.html

In your case hmmpress /lustre/saraswati/metascan_db//nitro.cycle.sub.hmm

You'll need to that for all hmm files

Edit: If you restart metascan add --force to the commandline. Otherwise it will stop when creating the folder that already exist. (or delete the created folder NIIKEOPD/. Works equally well.)

saras224 commented 1 year ago

Hi @gcremers Thanks for being patient and trying to resolve my issue but still I am getting the same error, the version install of hmmer is 3.3.2, is it because of the version it is showing that error? because I am already giving the absolute path to it?

When I am separately trying to index with hmmpress its showing this error: command: hmmpress -f /lustre/saraswati/metascan_db/nitro.cycle.sub.hmm error: Working... SSI index construction failed: primary keys not unique: 'A0A089XMA9' occurs more than once

Thanks Saras

gcremers commented 1 year ago

Hi Saras,

Sorry for the delay.

This does sound like a version issue. Hmmer changed after 3.1. You can download 3.1 from http://eddylab.org/software/hmmer/hmmer-3.1b2.tar.gz

A description on how to install can also be found on that site. After installation, you can use the full path of hmmpress to try indexing again with 3.1.

gcremers / metascan

How to run Metascan #1