Closed bcpd closed 4 months ago
Hi @bcpd,
You could conceivably do this with sylph. For example, if you wanted to profile both eukaryotes and prokaryotes, the correct way to do this is to simply do
sylph profile eukaryotes.syldb prokaryotes.syldb (samples)
This concatenates the databases. If you were to do a merge, the relative abundances wouldn't track (e.g. eukaryote species A may have 50% abundance compared to other eukaryotes, but only 1% abundance across all bacterial+euk species).
Let me know if you have any other questions,
Jim
Excellent - thank you! A related question: for taxonomic profiling, would sylph_to_taxprof.py
work with multiple metadata files?
@bcpd sylph_to_taxprof.py
would not work with multiple metadata files, but you can do
zcat metadata_file1.tsv.gz metadata_file2.tsv.gz ... > all_metadata_file.tsv
and the all_metadata_file.tsv
should work. Basically, the metadata file is just a 2-column file indicating the mapping of genome name to taxonomy string like "dbacteria;p....", see https://github.com/bluenote-1577/sylph/wiki/Integrating-taxonomic-information-with-sylph#custom-taxonomies-and-how-it-works
Great -- thanks very much.
@bcpd
sylph_to_taxprof.py
works with multiple metadata files now. you can do sylph_to_taxprof.py -m file1.tsv.gz file2.tsv.gz
I forgot when I added this change, but I'm adding this comment in now so future readers will not be confused.
Hi -
I'm interested in obtaining estimates of relative abundance across all domains of life. Is this possible with sylph? If so, it's unclear to me if that would entail concatenation of pre-built databases or whether a simple merge (after sylph_to_taxprof.py) would be sufficient.
Thanks very much.