fathominfo / delphy-web

Other
5 stars 0 forks source link

implement basic check for auto burn in prompt #8

Closed mark-fathom closed 4 weeks ago

mark-fathom commented 1 month ago

Detect a not-awful place to set the burn-in threshold.

mark-fathom commented 1 month ago

There's more we can do to make this obvious and apparent, but this is a start.

@patvarilly I went with two standard deviations rather than three: the latter included a lot more gunk. If you want to test it out, it's driven by the constant ACCEPTABLE_RANGE_IN_STDDEV at the top of src/ts/ui/runner/burninprompt.ts`.

mark-fathom commented 1 month ago

Question: if the user never overrides the auto suggested knee, should we export the kneeIndex? Because on reload, there would be no way to distinguish between the auto-selected knee and a curated knee. We can do a bunch of juggling to unset the knee on export and then reset it, and auto set the knee on load, or we can add a flag to the export for something like kneeIsManuallySet or kneeIsCurated ?

patvarilly commented 1 month ago

Good point! I would go for not saving the knee on export, then auto-set it on load if it hasn't yet been chosen. That leaves your hands free to implement other suggestions for the knee in the future, and also adds a good suggested knee to previously saved runs where a user "accidentally" forgot to set a knee.

On Tue, Oct 22, 2024 at 3:25 PM Mark Schifferli @.***> wrote:

Question: if the user never overrides the auto suggested knee, should we export the kneeIndex? Because on reload, there would be no way to distinguish between the auto-selected knee and a curated knee. We can do a bunch of juggling to unset the knee on export and then reset it, and auto set the knee on load, or we can add a flag to the export for something like kneeIsManuallySet or kneeIsCurated ?

— Reply to this email directly, view it on GitHub https://github.com/fathominfo/delphy-web/pull/8#issuecomment-2429282362, or unsubscribe https://github.com/notifications/unsubscribe-auth/AALL3ORY57G76VTP4QIUOF3Z4ZG4TAVCNFSM6AAAAABPKMQW5CVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDIMRZGI4DEMZWGI . You are receiving this because you were mentioned.Message ID: @.***>

patvarilly commented 1 month ago

So, there's a good answer, and there's a fudge.

The good answer is as follows. If you were sure that the samples in the traces were independent, then you expect some samples to be more than 1,2,3,... std devs away from the mean just by chance (the usual numbers are that 68% of samples are within 1 s.d., 95% are within 2 s.d., 99.7% are within 3 s.d, 99.994% are within 4 s.d. and 99.99994% are within 5 s.d.). If a fraction x is outside the test range, then the expected number of in-range samples you can expect to see before you first hit an out-of-range sample follows a Geometric distribution ( https://en.wikipedia.org/wiki/Geometric_distribution) with p = x, which has mean 1-p / p. So for 1,2,3,4,5 s.d.s, that's respectively 2.1, 19, 332, 16.7k and 16.7M. You could go a little further and pick an s.d. threshold such that there's, say, a 99% chance that you won't cut off any of the N samples in your run just by chance. But that's probably overkill for Delphy web. Instead, I'd go with "3 or 4 std devs" and call it a day.

The fudge is that you don't have independent samples in the traces yet. You'd need to calculate the Effective Sample Size (ESS) and then calculate the std. dev. of the traces using that instead of the actual number of samples in the trace (i.e., std dev = sqrt([sum_i (x_i - mean_x)^2] / ESS) instead of sqrt([...] / N) (there's also a -1 correction in the denominator that, between friends, we'll ignore). If you did have ESSs handy (and I hope we do some day), then that together with 3-4 std devs would seem like a great heuristic. Since we don't yet, going up to 5 "std devs" seems a good compromise for now.

Tl;dr: a cutoff of 5 s.d.'s seems reasonable to me for now

On Tue, Oct 22, 2024 at 3:57 PM Mark Schifferli @.***> wrote:

@.**** commented on this pull request.

In src/ts/ui/runner/runui.ts https://github.com/fathominfo/delphy-web/pull/8#discussion_r1810783531:

   this.mutCountCanvas.setData(numMutationsHist, kneeIndex, mccIndex, hideBurnIn, sampleIndex);
  • this.popGrowthCanvas.setData(popGHist.map(g=>POP_GROWTH_FACTOR/g), kneeIndex, mccIndex, hideBurnIn, sampleIndex);
  • this.popGrowthCanvas.setData(popHistGrowth, kneeIndex, mccIndex, hideBurnIn, sampleIndex);
  • if (this.pythia && this.pythia.kneeIndex <= 0) {

Hrm, consistently updating the knee from the latest sample results in only a few sampled trees. Here's a long run of 1777 samples, and you can see the knee allows only 4: Screen.Shot.2024-10-22.at.9.26.44.AM.png (view on web) https://github.com/user-attachments/assets/f3be0547-6f32-4dbd-b294-d56be691e2ca

Exporting and reloading the file with the the acceptable range at 3 std deviations rather than just 2, we get 53 samples, at 4 std dev we get 56, 5 -> 1312.

Hrm, if we repeatedly calculate the knee, then I retract my comment about needing to set the limit at 2 std dev. That's too restrictive, at least for this data set. 5 doesn't look that bad while in progress: Screen.Shot.2024-10-22.at.9.53.57.AM.png (view on web) https://github.com/user-attachments/assets/fb7b5184-265a-45d2-adcc-7771ad25e12d

@patvarilly https://github.com/patvarilly what do you think?

N.B. still have to work out the mechanics of distinguishing between the auto and user selected knee.

— Reply to this email directly, view it on GitHub https://github.com/fathominfo/delphy-web/pull/8#discussion_r1810783531, or unsubscribe https://github.com/notifications/unsubscribe-auth/AALL3OSD7OZCFTQPEQ4ZC4LZ4ZKULAVCNFSM6AAAAABPKMQW5CVHI2DSMVQWIX3LMV43YUDVNRWFEZLROVSXG5CSMV3GSZLXHMZDGOBVGI4TONJTHA . You are receiving this because you were mentioned.Message ID: @.***>

mark-fathom commented 1 month ago

I'm gonna solicit some feedback on the ui for curating vs auto-setting the burnin, but @patvarilly I think your requested changes are ready.

mark-fathom commented 1 month ago

Awesome, thank you for updating the algorithm!