foretold-app / widedomain

An experiment to create mathematical models with clean interfaces
MIT License
8 stars 2 forks source link

Symbolic: Support multiplication #44

Closed OAGr closed 4 years ago

OAGr commented 4 years ago

It's pretty evident this would be useful, but it would be tricky. First, try getting things to work with simple multiplication, later we could figure out methods with other functions.

Things to possibly do here: 1) [A distribution] times [a constant] 2) [A constant] times [a constant] 3) [A distribution] times [a distribution]

1 is the most important, it could be fine for one story. 2 is important, maybe that could be done using math.js. 3 is tricky and a bit complicated. It's possible we could do convolution for it, which could be neat.

Gustavo Lacerda is knowledgeable about how to do this. It seems like there are some clever techniques with matrices, especially with (3) and convolution for operations other than multiplication.

skosch commented 4 years ago

A rough proposal, also related to #43: replace `Simple and `PointwiseCombination with a tree of sums and products. So instead of mm(normal(0, 1), normal(10, 5), [0.3, 0.9]) we would accept 0.3*normal(0, 1) + 0.9*normal(10, 5) and convert that into

(+ (* 0.3 normal[0,1]) (* 0.9 normal[10,5]))

(Obviously the mm would stay for backwards compatibility.) We'll have to implement some checks to make sure there aren't any standalone scalars in the expressions, so a SumNode can't be a leaf, each ProductNode has to have at least one distribution in its subtree, etc.

We can then do the following:

  1. Renormalize the tree: because each distribution integrates to one, we can just evaluate the whole tree to get the total, i.e. (+ (* 0.3 1.0) (* 0.9 1.0)) = 1.2, and to normalize, we just wrap it all in another ProductNode: (* [1/1.2] (+ (* 0.3 normal[0,1]) (* 0.9 normal[10,5]))). Aggregating two existing trees like that (weight, sum, renormalize) should be trivial. (This does make it important that distributions really do evaluate to 1, so any truncated distribution functions would have get pre-normalized by adding a corresponding factor.)

  2. Recursively simplify. Every ProductNode should have only a scalar and a dist left (or a dist times a dist, or a scalar times another ProductNode). When distributions have a known product, make the respective replacement (most obviously, see list of convolutions of probability distributions, though perhaps there are others). Probably other optimizations, if we care enough. (I'm a little worried about notation. E.g. multiplying two identical Gaussians vs. convolving them has literally the opposite effect. Do users know what they're doing?)

  3. Render into x/y points. We can lazily compile our tree into an infix-format string literal, including the calls to the Jstat functions, and turn that into a new Function (a.k.a. eval). Feels a bit dirty, but seeing as we then go and call that function thousands of times, it may well be the least painful & most performant solution. I'm not married to the idea, but it's how scijs/ndarray does it.

I don't think I understand all of the implications of such a change so would like to discuss in more detail before I start coding. I also don't know how much we can leverage math.js to help with it, nor have I looked at owl yet. Ideas/feedback welcome!

OAGr commented 4 years ago

Super quick thoughts: Will think more and write more later:

1) Good points! Happy you are looking into the math here.

2) I've been thinking of differentiating regular operations from "pointwise" operations. Maybe "pointwise" addition could be written as ".+" instead of "+". What you list in the top equations is "pointwise", which is the alternative to the one used in Guesstimate. I'd write the version you have as:

(.+ (.* 0.3 normal[0,1]) (.* 0.9 normal[10,5]))

This is very different to:

(+ (* 0.3 normal[0,1]) (* 0.9 normal[10,5]))

Which would treat the addition similar to how it would treat adding two distributions in Guesstimate, where they are added along the x axis, not the y axis, so to speak. The multiplication I mentioned in this story refers to the latter type, not the former.

3) I like small stories :) I'm happy with some of this being a longer-term plan, but like it if it can e done in small incremental chunks if possible. For instance, "Recursively simplify" could be a pretty huge problem, is a very standard desire from a compiler (I would assume), and also seems easy enough to just not do quite yet.

skosch commented 4 years ago

Alright, + vs .+ is a really helpful distinction. So:

OAGr commented 4 years ago

Yep. for *, we may eventually want at least 3 different techniques (but this would be a big project!)

skosch commented 4 years ago

I'm tinkering with multiplications at the moment and am running into the ambiguity between discrete values and multiplicative constants. Currently, standalone scalars are interpreted to mean "put a Dirac-delta distribution at the given value", i.e. 1 := δ(1). But we now want to allow users to use numbers freely to scale distributions along Y, and let functions like mm take care of the renormalization.

Defining 1 := δ(1) creates confusion on both sides:

  1. Does 0.5 * 1 mean δ(0.5) * δ(1) = δ(0.5) or 0.5 * 1 = 0.5 or 0.5 * δ(1)?
  2. Does 0.5 * normal(0, 1) mean δ(0.5) * normal(0, 1) = normal(0, 0.5) or 0.5 * normal(0, 1) (which will be renormalized to normal(0, 1))?

As far as I can see, the only clean solution is to ask users to write e.g. d(1) when they mean the point distribution, and 1 when they mean the constant. What do you think?

OAGr commented 4 years ago

Good point, it seems like there's a lack of clarity here. I think one tricky thing is that one possible reason for typing, 3 * normal(2,1) + 5 * normal(10,10) is that you want to weigh the first distribution by 3, and the second by 5. My impression is that this will be a fairly fringe use case, and we could instead just support it with a separate function. Most of the time, when a user types, 3 * normal(2,1) I would expect they want to take each x-value and multiply that by 3, as in, normal(6,1). That seems to be what users have been doing so far.

My impression is that it should generally work the way it does in Guesstimate; 0.5 * normal(2,1) would do the same as if you took every sample from normal(2,1) and multiplied that sample by 0.5. This would be equivalent to normal(1,1).

Again, there's a distinction between * and .*, so to speak. 0.5 * 1 = δ(0.5). 0.5 .* 1 = 0, as there is no probability mass where they overlap.

This means that your previous example of: (+ (* 0.3 normal[0,1]) (* 0.9 normal[10,5])) being the same as what we are doing with multimodals, should not hold up.

Perhaps we could add a new version of multiplication, called ..*, for distributions. ..* would do something more like what is happening with the multimodals. It probably would have to use sampling with distributions. So, mm(normal(5,2), normal(10,2), [1,3]) is the same as (normal(5,2) ..* 1) +. (normal(10,2) ..* 3).

This could also be done with distributions, like, (normal(5,2) ..* normal(4,1)) +. (normal(10,2) ..* normal(10,2)). The latter would not be a commutative operation. People have done things like this in Guesstimate a fair bit; like, there could be uncertainty in one variable, which would effect the chances that you would have an itself uncertain sales opportunity.

These kinds of operations are all done in code, I guess they are tricky when trying to express a symbolic formulation. I imagine there must be formal names for these varieties.

skosch commented 4 years ago

Most of the time, when a user types, 3 * normal(2,1) I would expect they want to take each x-value and multiply that by 3, as in, normal(6,1)

Is that a typo? If you multiply every sample, you should get normal(6,3), i.e. the product distribution δ(3) * normal(2, 1). I just tried it in Guesstimate and that's indeed what happens. Anyhow, if users expect 3 := δ(3) then we should respect that. Fortunately, that kind of algebra works out (as far as I can tell): (3+4) * normal(0, 1) = (δ(3) + δ(4)) * normal(0, 1) = δ(3+4) * normal(0, 1) as expected, and similarly (3*4) * normal(0, 1) = (δ(3) * δ(4)) * normal(0, 1) = δ(3*4) * normal(0, 1). In other words, any expression that only uses + and * should stay normalized, just like in Guesstimate.

That leaves us with pointwise operations, i.e. expressions that need to be renormalized. We could consider using | and & instead of +. and ..*. They're both recognized by mathjs and personally, I like the boolean union/intersection interpretation:

mm(normal(0, 1), normal(5, 2), [3, 4]) == normalize(c(3) & normal(0, 1) | c(4) & normal(5, 2))

where c(3) is a distribution that has the value 3 everywhere. Similarly, normalize(normal(0, 1) & normal(5, 2)) would be the intersection, aka pointwise product.

OAGr commented 4 years ago

Is that a typo? It wasn't a typo, but it was a math mistake I made. Good point!

"| and & instead of +. and .." Given that they are called "pointwise", it feels intuitive to me to have a point in there for the +. or .+ operations. Agreed that ``..is a bit much, was just using that as an example. I'm not sure we actually need an infix for what I referred to as..*``. The important thing at this point is that we handle it elegantly in the symbolic setup, in part so we can just change multimodals to use that. We don't need to support a syntax for it now, that could come later if at all. Seems pretty rare to me.

OAGr commented 4 years ago

Another nice thing about the dot is that it works with many operations. I could imagine *., +., ^., etc. We could say that | = +., but that would basically be syntax sugar, and I could see that being confusing, as it's a bit of a special case. Happy to run it by Foretold users later though when we get to that point. It would also be obvious if the use case were very frequent, which could be the case.

OAGr commented 4 years ago

For this particular story, the only thing we need to work, is to have normal(5,2) * 3 -> normal(15, 6) via rendering, not symbolic logic (so it's general purpose for all distributions).

The next step is something like allowing, normal(5,2) * 2 * 2 -> normal(20, 8)

If we can get the code for that to work well and be clean that would be a solid step.