RasmussenLab / vamb

Variational autoencoder for metagenomic binning
MIT License
237 stars 42 forks source link

Format of taxonomy tsv needed for taxvamb #299

Open prototaxites opened 4 months ago

prototaxites commented 4 months ago

Hi! Appreciate I'm maybe running a bit full-steam ahead here and asking about software that's not published - I was looking at running vamb, and noticed there is an option to use taxonomy information during binning via taxvamb.

I already have mmseqs taxonomy databases for each of my metagenome assemblies, and can easily get these to output taxonomy TSVs. I just wanted to check what fields are actually needed in the TSV - I see in the taxvamb snakemake workflow that mmseqs taxonomy is called with --tax-lineage 1, whereas mine were created with no additional arguments - is the additional column created with the full lineage information required in order to use the software?

jakobnissen commented 3 months ago

@sgalkina Is it possible to amend the parser at parse_taxonomy, such that 1) It can be read from the code what format it's supposed to be in, and 2) It guards against the format changing in the future (e.g. if mmseqs decide to add a column somewhere in its output)