Open tavareshugo opened 4 months ago
Good question. --covcut
is really just to modify the calculation used to generate the "coverage" output (10x by default), as there's no reason everyone needs to stick with that arbitrary choice of threshold. We're working on a more detailed version of our documentation now- we'll make sure to describe the differences more clearly to users there.
Thanks for clarifying. I guess describing their different in the docs would help indeed. Mostly to avoid that people mistakenly use --covcut
when they mean to use --depthcutoff
, as the two options are essentially a depth threshold, but used for different things: calculating coverage or estimating abundances, respectively.
Related to this then, I wonder if it would be worth to output the fraction of informative sites above --depthcutoff
used for the demix step. For example, if there are 1000 total informative sites from UShER, what fraction of those were used by demix
.
If I understood the documentation correctly,
--covcut
is only used to calculate the genome coverage at a given depth (10x by default), but it doesn't influence the abundance estimation, is that correct?Only
--depthcutoff
would affect abundance estimation, as it would exclude sites with depth < threshold (0 by default, i.e. all sites with at least 1 read are used).If that is the case, what is the purpose of having
--covcut
? It would seem more intuitive to me that the "coverage" value output would be the fraction of the genome that was used for thedemix
inference step.