Closed whottel closed 11 months ago
Hey Wes,
Thanks for raising this and providing example inputs/outputs for the different versions!
--depthcutoff
was added to address an issue users were experiencing with the solver failing to converge on a solution for low coverage samples with the given barcodes file. We regularly update this barcodes file with new lineages as they come up, so it's a good idea to occasionally run freyja update
to ensure you have the most recent barcodes.
In some instances (#137), the sample coverage is high enough to work with versions of the barcode file, but fails with others. My guess is that when you updated to v1.4.5, it came with an up-to-date usher_barcodes.csv
, which is now resulting in a Solver Error despite it working fine with the previous barcode version. Could you verify that you're running demix with the same barcodes file between the two versions? To get an earlier barcode file, select a previous "updating barcodes and metadata" commit here, and download the corresponding freyja/data/usher_barcodes.csv
. You can then pass this custom barcode file into demix
via the --barcodes
option. I'll try to reproduce the error as well using a few different barcode versions.
To your point regarding the varying output for when using different values for --depthcutoff
, this option finds SNVs in the barcode file where the sequencing depth is below the specified cutoff value. These sites are then removed from the barcodes, which in many cases results in multiple lineages having the same barcode. These lineages are subsequently grouped into higher-order barcodes based on their shared phylogeny. For your sample using --depthcutoff 30
, XBB.1.5.24 and XBB.1.5.28 are being grouped into XBB.1.5-like. However, when you use --depthcutoff 10
, the two sub-lineages are still distinguishable from one another, resulting in them both being listed in the output.
Your chosen --depthcutoff
value is essentially going to be a tradeoff between accuracy and specificity in the final lineage classifications, but 10 should be a reasonable place to start.
-Dylan
Hi Dylan,
Thanks for your response. I do have a follow-up question. Would it be the case that in previous versions, before the --depthcutoff
option was added that variant sites with only 1x read depth would be included in the lineage abundance calculation or was there some other minimum read depth?
Thanks, Wes
Generally, how much sequencing depth is enough?
Hi Dylan,
Thanks for your response. I do have a follow-up question. Would it be the case that in previous versions, before the
--depthcutoff
option was added that variant sites with only 1x read depth would be included in the lineage abundance calculation or was there some other minimum read depth?Thanks, Wes
Yes that's correct, prior to this feature there wasn't any exclusion threshold based on coverage.
Great, thanks for the clarification.
@ybdong919 There isn't really a set threshold that is "enough". The answer tends to depend on what you're trying to infer (you'll need more coverage if you want to recover lineage-level frequencies, but if you just want VoC frequencies you can work with less coverage). As a heuristic- usually 60% genome coverage at 10x read depth is ok, but results will depend strongly on the specific regions of the genome that are covered. The lineage collapse functionality, enabled using the --depthcutoff
parameter in demix
can be useful in figuring out which lineages can be differentiated given the available coverage.
Hello,
I am interested in understanding more about what is going on with the
--depthcutoff
parameter added in version 1.4.5 and choosing a better default value at least in my case. I noticed that some samples produced the following error in v1.4.5 "demix: Solver error encountered, most likely due to insufficient sequencing depth. Try increasing the --depthcutoff parameter." These same samples would generate a demix file as normal with v1.4.4. And depending on the chosen cutoff in version 1.4.5 the top lineages vary, and produce a different result than v1.4.4. Please find attached depths and variants files of an example impacted sample, and a summary table showing the demix output from different version/parameters. 2312877-SC2WW-IA-VH01284-230809_S82_variants.tsv.xlsx 2312877-SC2WW-IA-VH01284-230809_S82_depths.tsv.xlsx freyja_demix_comparison.xlsxI am thinking by default I should not use the value 0 for the cutoff parameter as this causes previously acceptable samples (freyja coverage 81.35% in this case) to fail.
Thanks, Wes