bluenote-1577 / sylph

ultrafast taxonomic profiling and genome querying for metagenomic samples by abundance-corrected minhash.
MIT License
136 stars 6 forks source link

Taxonomic profiling for contigs #12

Closed zehanna closed 2 months ago

zehanna commented 2 months ago

Hi, I want to use sylph to taxonomically profile a set of contigs against the GTDB v220 database. I installed sylph v.0.6.1 with conda.

I sketched the database from the contigs like this: sylph sketch contigs.fa -i -o contigs

then I want to do the taxonomic profiling like this: sylph profile contigs.syldb v0.3-c200-gtdb-r214.syldb -o contigs_gtdb_sylph_results.tsv

But I get the error: 2024-06-28T12:45:16.610Z INFO [sylph::contain] Obtaining sketches... 2024-06-28T12:45:16.610Z ERROR [sylph::contain] No read files found; see sylph query/profile -h for help. Exiting

Apparently sylph is looking for read files, and doesn't understand it's a contig database. Could you help me fix this problem?

Thanks a lot in advance

bluenote-1577 commented 2 months ago

Hi @zehanna

Sylph is not meant for profiling contigs; it implicitly assumes that you have redundant, sequenced reads. It differs from e.g. kraken, in this case.

There's ways to hack sylph to profile contigs (i.e., sketch contigs with sylph sketch -r contigs.fa), but the results will be more unreliable and I don't recommend it; profiling directly against reads would be more accurate here.

Thanks,

Jim

zehanna commented 2 months ago

Hi @bluenote-1577, thanks a lot for the quick answer and clarification!