foretold-app / widedomain

An experiment to create mathematical models with clean interfaces
MIT License
8 stars 2 forks source link

Multimodals should be dealt with better #58

Closed OAGr closed 4 years ago

OAGr commented 4 years ago

Currently they are quite slow and lossy. I think some of this is that it's because they are handled with each pair combined separately; this seems suboptimal.

I think there should be a way to combine a bunch of distributions at once, like it used to be, instead of only accepting functions with 2 arguments.

This could mean we should make a new function in ExpressionTree that accepts an array of nodes, instead of only 2.

Example input: mm(normal(-5, 1), normal(0,1), normal(5, 1), normal(8,1), normal(12,1), normal(16,1))+33

Old version:

image

New version

image
skosch commented 4 years ago

I'm open to making an array version, but I can't reproduce the lossiness (it's due to downsampling; just set that to 1000 like it used to be) nor the slowness (it's pretty fast here).

I'd be surprised if there's any real difference actually. Each individual distribution is rendered at n points, and then they're all combined, one by one, via linear interpolation. That hasn't changed at all :thinking:

OAGr commented 4 years ago

The string, mm(normal(0, 1), normal(4, 1), normal(8, 1), normal(12, 1), normal(16,1)) Is definitely much slower in development than when done in the app here, for me.

I may be misunderstanding how this is processed. My impression is that originally, with multimodals, the algorithm worked by,

  1. find the min and max of all the distributions
  2. take n points between the totalMean and totalMax, then calculate each pdf at each of those points. Sum these.

The new algorithm seems to work by, Do (1 and 2) above for the first two distributions, then combine those two. Then do (1 and 2) on that combination, plus the next distribution. If this were true, I'd imagine it would keep on choosing different points to do the calculations on, meaning it would need to do a fair bit more interpolation (on each of the intermediate nodes, as well as the subnodes)

skosch commented 4 years ago

Ah, interesting. You're absolutely right. So the old code first joins all the individual ByWeight xs, and then evaluates the pdfs of all functions everywhere:

  let continuousShape = (dists: t, sampleCount: int) => {
    let xs =
      dists
      |> E.A.fmap(r =>
           r
           |> fst
           |> GenericSimple.interpolateXs(
                ~xSelection=`ByWeight,
                _,
                sampleCount / (dists |> E.A.length),
              )
         )
      |> E.A.concatMany;
    xs |> Array.fast_sort(compare);
    let ys = xs |> E.A.fmap(pdf(_, dists));
    XYShape.T.fromArrays(xs, ys) |> Distributions.Continuous.make(`Linear, _);
  };

The new algorithm works as you described: it renders each pdf, and then interpolates them together.

I underestimated the slowness of the interpolation step. It does a binary search for the corresponding array indices at each step of the loop (so it's something like O((n+m)*(log n + log m))). I've now added an alternative implementation that loops through both arrays simultaneously and interpolates as it goes along (should be O(n+m)).

Whether or not this is faster should now depend entirely on which takes longer: evaluating e.g. a Gaussian at a single point, or doing a single linear interpolation (correct me if I'm missing something). That's something we can benchmark in the future.

Can you confirm whether this improves things on your end?

PS. I do think there's value in keeping track of minX and maxX as we recurse through the expression tree (not least to bound the sampler, when we have to use it), and we should probably do that in the future.

OAGr commented 4 years ago

TLDR: Things seem fine to me now if we assume downsampling to 1000 points or fewer, but it does seem a bit slower when the downsampling param is removed. I'm not sure exactly how this param is being used now.


Testing:

First, I removed the "Truncate To" parameter. These are the resulting stats. It looks like the first version took 150ms per key change, the second (the new one) around 400ms or so.

This is what it is on master for me:

image

This is what it is on the current epic branch

image
skosch commented 4 years ago

Interesting, thanks for sharing the screenshots! I'm curious about the breakdown of the render and combine parts at the very bottom of the second chart ...

But yeah, let me find out what the sampleCounts are when the truncateTo/downsampleTo field is empty.

OAGr commented 4 years ago

I do recommend spending some time in the performance flame chart, at least in Firefox, I've found it to be really useful for optimization.

OAGr commented 4 years ago

I'm going to close, as is not a huge deal right now.