andersen-lab / Freyja

Depth-weighted De-Mixing
BSD 2-Clause "Simplified" License
102 stars 29 forks source link

RA not summing to 1 #190

Closed djfeistel closed 10 months ago

djfeistel commented 10 months ago

is there any reason that the lineage abundances would not sum?

demix: summarized [('Omicron', 0.9225034440408236), ('Other', 0.0063870700034077904)] lineages EG.5 FY.5.1 FY.5 HK.17 FY.1.2 XBB.1.16.17 HV.1.1 XBB.1.5.28 XBB.1.16.8 FL.5 HN.3 FY.1.4 FL.1.5.1 XBB.2.3.11 XBB.2.4 XBB.1.16.19 XDA XBB.1.16.18 EG.2.3 FU.5 HA.1 XBB.2.9 GE.1.5 FT.1 XBB.1.5.27 HS.1 JA.1 HK.9 HU.2 XBB.1.5.21 XBB.1.16.24 XCF XBB.1.37.1 FE.1.2 GS.3 abundances 0.18544192 0.12940504 0.10806858 0.08764833 0.08119660 0.06792104 0.06691180 0.04245277 0.02643833 0.01839944 0.01833910 0.01602885 0.01597601 0.01460823 0.01090250 0.00534759 0.00514740 0.00311818 0.00241015 0.00189260 0.00186567 0.00183066 0.00167785 0.00152053 0.00151976 0.00147929 0.00139526 0.00138026 0.00132852 0.00129870 0.00124185 0.00123967 0.00116686 0.00114613 0.00114504 resid 22.828016188293425 coverage 99.77594221315587

we are not using the --eps' flag, but if we did, would the results come back with RA summing to 1 or no?

Thanks!

joshuailevy commented 10 months ago

They should always sum (it's enforced as a constraint in the optimization step). Setting --eps to a small value should make things sum to 1, up to machine epsilon anyhow.

Josh

djfeistel commented 10 months ago

so them running the default setting for frejya demix should always sum to 1, yes? can you think of any reason why we might be seeing this results? perhaps something else i am not thinking of (brain storming here :))

joshuailevy commented 10 months ago

Under the hood, it should always sum to 1... but the user will only see lineages with abundances greater than --eps (default parameter is set so that random lineages at an abundance of 1E-4 are not presented to the user, as they're generally just seq errors or other artifacts etc.). See here: https://github.com/andersen-lab/Freyja/blob/c78d27e7de50a46ff3547963ebe602775b63ee44/freyja/sample_deconv.py#L174

djfeistel commented 10 months ago

ok, that makes sense. wasnt clicking until you explained it that way. Much appreciated it, Josh!

joshuailevy commented 10 months ago

No problem! Also- we've got some functionality to handle false positives associated with seq errors and other possible artifacts coming soon... hoping to add it into the main branch in the next week or two :)