Closed davidweichiang closed 1 year ago
1d10319
The only way to compute an expectation would be to compute the full distribution, so it can't be an expectation over natural numbers.
I added an example for the question "what's the probability that HTH comes before HHT"? Weirdly, this has a different answer in general from the question "which has the lower expected number of flips"?
We can compute the expectation by processing the result Nat with the following function:
define weigh = \n.
case n of | Zero -> sample fail
| Succ n -> amb unit (weigh n);
I made autodiff work with this patch, which probably breaks things:
diff --git a/bin/sum_product.py b/bin/sum_product.py
index 71a45fa..82def82 100755
--- a/bin/sum_product.py
+++ b/bin/sum_product.py
@@ -27,5 +27,19 @@ if __name__ == '__main__':
for el, fac in fgg.interp.factors.items():
fac.weights = torch.as_tensor(fac.weights, dtype=torch.get_default_dtype())
-
- print(json.dumps(fggs.formats.weights_to_json(fggs.sum_product(fgg, method=args.method))))
+
+ for name, weights in args.weights:
+ el = fgg.grammar.get_edge_label(name)
+ fgg.interp.factors[el].weights.requires_grad_()
+
+ sp = fggs.sum_product(fgg, method=args.method)
+ if args.weights:
+ sp.backward(torch.sparse_coo_tensor([[0] for _ in sp.size()],
+ [1], sp.size()).to_dense())
+ print(json.dumps(fggs.formats.weights_to_json(sp)))
+ print(json.dumps({
+ name: fggs.formats.weights_to_json(
+ fgg.interp.factors[fgg.grammar.get_edge_label(name)]
+ .weights.grad)
+ for name, weights in args.weights
+ }))
Now coin_unfair.ppl
produces {"unfair": [2.3809525966644287, 2.3809525966644287]}
, and the fact that the two numbers are equal means that indeed the derivative of the von Neumann example is zero.
If I remove .to_dense()
above, then there's always an addition error that I can fix, and sometimes a multiplication error that I don't know how to fix (especially since printing the tensors involved seems to make the error go away)...
The weigh
trick sounds like it would work. Have you tried it?
We could also compute expectations using gradients (gradient of log Z with respect to log factor = expected count of factor).
Why do you need to use sparse tensors?
@ccshan Please try out https://github.com/diprism/fgg-implementation/pull/147 for computing both gradients and expectations.
Why do you need to use sparse tensors?
Probably just premature optimization on my part...
The
weigh
trick sounds like it would work. Have you tried it?
Yes! I got 7 and 5 out of code_{hth,hht}.ppl
I just pushed fixes and renamed to pattern1.ppl. (The right answers should be 10 and 8 -- it was an off-by-3 error!)
There's also a pattern2.ppl that uses a different method for expectations, and gives the same answer.
Should we delete examples that are no longer interesting? Or maybe split code/ into tests/ and examples/?
What mechanism(s) can we add to check that the tests actually produce the correct output?
Maybe the third option won't be so bad to implement if we limit to linear equations? Is there a standard linear solver?
I don't know what kind of testing framework is usually used, but it would be good to have a bunch of example programs and their corresponding FGGs.