bluenote-1577 / sylph

ultrafast taxonomic profiling and genome querying for metagenomic samples by abundance-corrected minhash.
MIT License
179 stars 6 forks source link

How to estimate read count #19

Open yejunbin opened 1 week ago

yejunbin commented 1 week ago

Hi,

Is sylph possible to estimate read count for each genome or taxonomy, like metaphlan or kraken?

thanks

bluenote-1577 commented 1 week ago

Maybe I will add an option for estimating the read count. Sylph does not classify reads directly, so only an estimate can be provided.

For now, you can estimate the read count for sylph by doing the following:

1) Use the -u option. This multiplies the Sequence abundance column by the % of classified reads.

2) Multiply the Sequence abundance of each row by the # of reads in your dataset. So if your fastq file has 3M reads and a genome has sequence abundance 5%, then it should have 150k reads assigned to it.

I'll probalby add a feature to do this in a new update.