Closed OAGr closed 4 years ago
A rough proposal, also related to #43: replace `Simple
and `PointwiseCombination
with a tree of sums and products. So instead of mm(normal(0, 1), normal(10, 5), [0.3, 0.9])
we would accept 0.3*normal(0, 1) + 0.9*normal(10, 5)
and convert that into
(+ (* 0.3 normal[0,1]) (* 0.9 normal[10,5]))
(Obviously the mm
would stay for backwards compatibility.) We'll have to implement some checks to make sure there aren't any standalone scalars in the expressions, so a SumNode
can't be a leaf, each ProductNode
has to have at least one distribution in its subtree, etc.
We can then do the following:
Renormalize the tree: because each distribution integrates to one, we can just evaluate the whole tree to get the total, i.e. (+ (* 0.3 1.0) (* 0.9 1.0)) = 1.2
, and to normalize, we just wrap it all in another ProductNode
: (* [1/1.2] (+ (* 0.3 normal[0,1]) (* 0.9 normal[10,5])))
. Aggregating two existing trees like that (weight, sum, renormalize) should be trivial. (This does make it important that distributions really do evaluate to 1, so any truncated distribution functions would have get pre-normalized by adding a corresponding factor.)
Recursively simplify. Every ProductNode
should have only a scalar and a dist left (or a dist times a dist, or a scalar times another ProductNode
). When distributions have a known product, make the respective replacement (most obviously, see list of convolutions of probability distributions, though perhaps there are others). Probably other optimizations, if we care enough. (I'm a little worried about notation. E.g. multiplying two identical Gaussians vs. convolving them has literally the opposite effect. Do users know what they're doing?)
Render into x/y points. We can lazily compile our tree into an infix-format string literal, including the calls to the Jstat functions, and turn that into a new Function
(a.k.a. eval
). Feels a bit dirty, but seeing as we then go and call that function thousands of times, it may well be the least painful & most performant solution. I'm not married to the idea, but it's how scijs/ndarray does it.
I don't think I understand all of the implications of such a change so would like to discuss in more detail before I start coding. I also don't know how much we can leverage math.js to help with it, nor have I looked at owl yet. Ideas/feedback welcome!
Super quick thoughts: Will think more and write more later:
1) Good points! Happy you are looking into the math here.
2) I've been thinking of differentiating regular operations from "pointwise" operations. Maybe "pointwise" addition could be written as ".+" instead of "+". What you list in the top equations is "pointwise", which is the alternative to the one used in Guesstimate. I'd write the version you have as:
(.+ (.* 0.3 normal[0,1]) (.* 0.9 normal[10,5]))
This is very different to:
(+ (* 0.3 normal[0,1]) (* 0.9 normal[10,5]))
Which would treat the addition similar to how it would treat adding two distributions in Guesstimate, where they are added along the x axis, not the y axis, so to speak. The multiplication I mentioned in this story refers to the latter type, not the former.
3) I like small stories :) I'm happy with some of this being a longer-term plan, but like it if it can e done in small incremental chunks if possible. For instance, "Recursively simplify" could be a pretty huge problem, is a very standard desire from a compiler (I would assume), and also seems easy enough to just not do quite yet.
Alright, +
vs .+
is a really helpful distinction. So:
.+
is pointwise addition. This is used to construct multimodal distributions..*
is pointwise multiplication. This is used to express joins/conditionals. This is easy for products of normals.+
refers to the distribution of the sum of two independently distributed variables. That is, the convolution of their pdfs. This is easy for normals.*
refers to the distribution of the product of two independently distributed variables. This is easy for lognormals and some other special cases.Yep. for *
, we may eventually want at least 3 different techniques (but this would be a big project!)
I'm tinkering with multiplications at the moment and am running into the ambiguity between discrete values and multiplicative constants. Currently, standalone scalars are interpreted to mean "put a Dirac-delta distribution at the given value", i.e. 1 := δ(1)
. But we now want to allow users to use numbers freely to scale distributions along Y, and let functions like mm
take care of the renormalization.
Defining 1 := δ(1)
creates confusion on both sides:
0.5 * 1
mean δ(0.5) * δ(1) = δ(0.5)
or 0.5 * 1 = 0.5
or 0.5 * δ(1)
?0.5 * normal(0, 1)
mean δ(0.5) * normal(0, 1) = normal(0, 0.5)
or 0.5 * normal(0, 1)
(which will be renormalized to normal(0, 1)
)?As far as I can see, the only clean solution is to ask users to write e.g. d(1)
when they mean the point distribution, and 1
when they mean the constant. What do you think?
Good point, it seems like there's a lack of clarity here. I think one tricky thing is that one possible reason for typing,
3 * normal(2,1) + 5 * normal(10,10)
is that you want to weigh the first distribution by 3, and the second by 5. My impression is that this will be a fairly fringe use case, and we could instead just support it with a separate function. Most of the time, when a user types, 3 * normal(2,1)
I would expect they want to take each x-value and multiply that by 3, as in, normal(6,1)
. That seems to be what users have been doing so far.
My impression is that it should generally work the way it does in Guesstimate; 0.5 * normal(2,1)
would do the same as if you took every sample from normal(2,1)
and multiplied that sample by 0.5. This would be equivalent to normal(1,1)
.
Again, there's a distinction between *
and .*
, so to speak.
0.5 * 1 = δ(0.5)
. 0.5 .* 1 = 0
, as there is no probability mass where they overlap.
This means that your previous example of:
(+ (* 0.3 normal[0,1]) (* 0.9 normal[10,5]))
being the same as what we are doing with multimodals, should not hold up.
Perhaps we could add a new version of multiplication, called ..*
, for distributions. ..*
would do something more like what is happening with the multimodals. It probably would have to use sampling with distributions. So,
mm(normal(5,2), normal(10,2), [1,3])
is the same as (normal(5,2) ..* 1) +. (normal(10,2) ..* 3)
.
This could also be done with distributions, like, (normal(5,2) ..* normal(4,1)) +. (normal(10,2) ..* normal(10,2))
. The latter would not be a commutative operation. People have done things like this in Guesstimate a fair bit; like, there could be uncertainty in one variable, which would effect the chances that you would have an itself uncertain sales opportunity.
These kinds of operations are all done in code, I guess they are tricky when trying to express a symbolic formulation. I imagine there must be formal names for these varieties.
Most of the time, when a user types,
3 * normal(2,1)
I would expect they want to take each x-value and multiply that by 3, as in,normal(6,1)
Is that a typo? If you multiply every sample, you should get normal(6,3)
, i.e. the product distribution δ(3) * normal(2, 1)
. I just tried it in Guesstimate and that's indeed what happens. Anyhow, if users expect 3 := δ(3)
then we should respect that. Fortunately, that kind of algebra works out (as far as I can tell): (3+4) * normal(0, 1) = (δ(3) + δ(4)) * normal(0, 1) = δ(3+4) * normal(0, 1)
as expected, and similarly (3*4) * normal(0, 1) = (δ(3) * δ(4)) * normal(0, 1) = δ(3*4) * normal(0, 1)
. In other words, any expression that only uses +
and *
should stay normalized, just like in Guesstimate.
That leaves us with pointwise operations, i.e. expressions that need to be renormalized. We could consider using |
and &
instead of +.
and ..*
. They're both recognized by mathjs and personally, I like the boolean union/intersection interpretation:
mm(normal(0, 1), normal(5, 2), [3, 4]) == normalize(c(3) & normal(0, 1) | c(4) & normal(5, 2))
where c(3)
is a distribution that has the value 3 everywhere. Similarly, normalize(normal(0, 1) & normal(5, 2))
would be the intersection, aka pointwise product.
Is that a typo?
It wasn't a typo, but it was a math mistake I made. Good point!
"| and & instead of +. and .."
Given that they are called "pointwise", it feels intuitive to me to have a point in there for the +.
or .+
operations. Agreed that ``..is a bit much, was just using that as an example. I'm not sure we actually need an infix for what I referred to as
..*``. The important thing at this point is that we handle it elegantly in the symbolic setup, in part so we can just change multimodals to use that. We don't need to support a syntax for it now, that could come later if at all. Seems pretty rare to me.
Another nice thing about the dot is that it works with many operations. I could imagine *.
, +.
, ^.
, etc. We could say that | = +.
, but that would basically be syntax sugar, and I could see that being confusing, as it's a bit of a special case. Happy to run it by Foretold users later though when we get to that point. It would also be obvious if the use case were very frequent, which could be the case.
For this particular story, the only thing we need to work, is to have
normal(5,2) * 3
-> normal(15, 6)
via rendering, not symbolic logic (so it's general purpose for all distributions).
The next step is something like allowing,
normal(5,2) * 2 * 2
-> normal(20, 8)
If we can get the code for that to work well and be clean that would be a solid step.
It's pretty evident this would be useful, but it would be tricky. First, try getting things to work with simple multiplication, later we could figure out methods with other functions.
Things to possibly do here: 1) [A distribution] times [a constant] 2) [A constant] times [a constant] 3) [A distribution] times [a distribution]
1 is the most important, it could be fine for one story. 2 is important, maybe that could be done using math.js. 3 is tricky and a bit complicated. It's possible we could do convolution for it, which could be neat.
Gustavo Lacerda is knowledgeable about how to do this. It seems like there are some clever techniques with matrices, especially with (3) and convolution for operations other than multiplication.