Closed jimen210 closed 2 months ago
It's strange to have a sequence with zero genes classified as a plasmid. Can you share some of those with me? I want to take a look to understand better what might be going on.
Sure, here are some from the contigs_plasmid_summary.
seq_name | length | topology | n_genes | genetic_code | plasmid_score | fdr | n_hallmarks | marker_enrichment | conjugation_genes | amr_genes |
---|---|---|---|---|---|---|---|---|---|---|
c_000000004590 | 2717 | No terminal repeats | 0 | 11 | 0.732 | NA | 0 | 0 | NA | NA |
c_000000008890 | 2543 | No terminal repeats | 0 | 11 | 0.7117 | NA | 0 | 0 | NA | NA |
c_000000017907 | 2942 | No terminal repeats | 0 | 11 | 0.7887 | NA | 0 | 0 | NA | NA |
c_000000036041 | 2807 | No terminal repeats | 0 | 11 | 0.7324 | NA | 0 | 0 | NA | NA |
Also, I forgot to mention that I used the "genomad end-to-end --cleanup" command for my dataset.
These sequences are too long to not have any genes. Do you think you can share the FASTA file with me?
Hi, sorry the delay. Yes, here is a subset which include the sequences mentioned above. I transformed into txt to be able to upload it. contigs_plasmid.txt
Thank you!
What is happening here is that the neural network is classifying these sequences as plasmids. Because they don't have markers, the weight of the neural network classifier is higher than the marker-based classifier (which doesn't classify those as plasmids). I'll implement additional filter for the next release.
Right now, you can use the marker_enrichment
to remove these cases. Just remove rows where marker_enrichment == 0
.
Thanks much for your help!
I just released version 1.8.0, which has new options and default parameters that will avoid cases like this.
Hello,
First, thanks for developing GeNomad, it ran smoothly in my metagenomic dataset from lakes.
After I ran genomad I got the contig summary mentioning
"15,657 plasmid(s) and 6,922 virus(es) were identified"
When I looked at the contigs_plasmid_summary.tsv I counted 15,657 rows, those I understand that are the plasmid identified by GeNomad. When I sum the number of genes from this n_genes column I got 66989 in total, which match the rows in the contig_plasmid_genes.tsv, thus the number of genes in plasmid are clear to me. However, in the contig_plasmid_summary.tsv I have several plasmids that zero n_genes, Should those be interpreted as empty plasmids?
Thanks again