apcamargo / genomad

geNomad: Identification of mobile genetic elements
https://portal.nersc.gov/genomad/
Other
168 stars 17 forks source link

Can genomad be applied to the metagenome bin? #22

Closed qkqk-hub closed 1 year ago

qkqk-hub commented 1 year ago

Hello, I have some metagenomic data and got some bin after megahit and metabat2 processing. Can I use genomad on bin to obtain the virus? Which contig assembled by megahit or bin processed by metabat2 can be used for genomad better? I have a lot of memory and cpu, can I speed things up? Do you have a better suggestion? Thanks!

apcamargo commented 1 year ago

Hi @qkqk-hub

You can absolutely do that.

  1. Filter out very short sequences (e.g. less than 1.5 kb), since the classification of those is not super reliable and they won't bin well anyway.
  2. Classify the whole metagenome, so you will have all viruses, including the non-binned. This is important because there might be complete viral genomes in a single scaffold (you won't need bins for those).
  3. Identify the bins with viral contigs and take an averaged mean of the virus score (averaged by the contig length) for each bin. This averaged mean will prevent you from flagging a bacterial bin with a single phage contig as a phage bin.

In the future I might implement native support for bins. In the meantime, the steps above should give you some good results.

I have a lot of memory and cpu, can I speed things up?

geNomad should be able to leverage your hardware with default parameters :)

qkqk-hub commented 1 year ago

Thank you very much for your reply. I understand.