Open wfgui opened 1 month ago
There is currently no option to do this, but I could implement it as a feature in the future. In the meantime, you can obtain the nucleotide sequences of the CDSs by extracting them from the genomes using the gene coordinates.
I also had a seemingly simple question about whether I could format the output at taxonomy, such as converting it to k; p; c; o; f; g; s__.
Thanks!
You can use taxopy
for that. geNomad's taxdump is inside the database directory, and you can find the TaxIds in the <prefox>_annotate/<prefox>_taxonomy.tsv
file.
For instance:
import taxopy
taxdb = taxopy.TaxDb(
nodes_dmp="genomad_db/nodes.dmp",
names_dmp="genomad_db/names.dmp",
keep_files=True
)
taxon = taxopy.Taxon(5797, taxdb)
for rank, name in reversed(taxon.ranked_name_lineage):
if name != "root":
print(f"{rank}__{name}")
realm__Duplodnaviria
kingdom__Heunggongvirae
phylum__Uroviricota
class__Caudoviricetes
order__Crassvirales
What's the difference between "Unclassified" and "Viruses;;;;;;" ?
"Unclassified" means that the genes in the sequence had no matches to markers with taxonomy information. "Viruses" means that the classification is uncertain at a high rank.
Hi, In the example above I can see proteins FASTA file of GCF_009025895.1_virus_proteins.faa. I want to calculate the gene abundance with virus gene sequence.Can we output the corresponding nucleotide sequence?
Thanks!