apcamargo / genomad

geNomad: Identification of mobile genetic elements
https://portal.nersc.gov/genomad/
Other
169 stars 17 forks source link

Sorry I have one more question #62

Closed songmj86 closed 5 months ago

songmj86 commented 6 months ago

Hi I have another question to ask !

Bacteria MAGs mostly are compirsed of multiple fragmented contigs/scaffolds.

If one of those contigs/scaffolds are annotated as plasmid (for example having plasmid score > 0.9) using Genomad, can I regard those contigs/scaffolds as the plasmid not chromosme ?

Thank you very much !

apcamargo commented 6 months ago

Good question. You could take a average of the scores weighed by their length.

Alternatively, you can concatenate them before classification, adding a couple of Ns between each contig (to prevent gene calls extending across different contigs). This would allow geNomad to leverage the information of all contigs when performing classification.

If one of those contigs/scaffolds are annotated as plasmid (for example having plasmid score > 0.9) using Genomad, can I regard those contigs/scaffolds as the plasmid not chromosme ?

To answer this quickly, no. Plasmid contigs may bin with chromosomal sequences, thus it is not advisable to assume that the entire bin represents plasmid content.

songmj86 commented 6 months ago

Thanks for your quick response

If I choose the alternative option, I need to do the following steps as an example.

Step 1. Prepare the concantenated contigs

Contig1 ATCCGCATC ... ATCCGCATC Contig2 CTGACGTAC ... CTGACGTAC

Step 2. Attach five Ns at the end of each contig

Contig1 NNNNNATCCGCATC ... ATCCGCATCNNNNN Contig2 NNNNNCTGACGTAC ... CTGACGTACNNNNN

Step 3. Run Genomad

Did I understand correctly??

I sincerely appreciate your help !

apcamargo commented 6 months ago

You need to concatenate the contigs into a single sequence, so that geNomad will process the whole thing as one entity. Like this:

>seq
<contig 1 sequence>NNNNNNNNNN<contig 2 sequence>NNNNNNNNNN<contig 3 sequence>...