Identity factors - Githubissues

ccshan commented 1 year ago

This is to fix #128

ccshan commented 1 year ago

I've been working through multi.py and trying to make EmbeddedTensor support the Tensor operations used there. I'm looking at the reshape calls in multi_mv and multi_solve. Is it easy to explain how these are used? Is it just adding/removing dimensions of size 1? Does it ever even turn one dimension of size 6 into two dimensions of sizes 2 and 3, let alone something more nefarious? @davidweichiang

davidweichiang commented 1 year ago

If there are nonterminals $X_1., .., X_n$, where $Xi$ has shape $(d{i,1}, ..., d_{i,mi})$, let $M = (\mathbb{R}^{d{1,1}} \otimes \dots \otimes \mathbb{R}^{d_{1,m1}}) \oplus \dots \oplus (\mathbb{R}^{d{n,1}} \otimes \dots \otimes \mathbb{R}^{d_{n,m_n}})$. Then a MultiTensor is an element of $M$ or a linear transformation $M \rightarrow M$. The job of multi_mv(a,b) is to apply a to b, and the job of multi_solve is to solve the equation x = ax+b.

The purpose of the reshapes is to flatten one of the $\mathbb{R}^{d{i,1}} \otimes \dots \otimes \mathbb{R}^{d{i,m_i}}$ into a vector. So, it's not just squeezing/unsqueezing. Note that it's possible for $m_i = 0$, so this actually "flattens" a scalar into a 1-element vector.

ccshan commented 1 year ago

Ok so if I understand you correctly, the granularity of each $d_{i,j}$ is that of a letter in an einsum equation, not finer but also not coarser. I should be able to preserve sparsity through a multi_mv operation...

davidweichiang commented 1 year ago

Sorry I've been letting this sit so long. What would be needed to integrate this into the rest of the code? Are there particular things that you need me to check?

ccshan commented 1 year ago

Hi! Aria is doing two things (digging a tunnel from both ends):

Write more tests for einsum
Replace Tensor by EmbeddedTensor in MultiTensor and see what breaks

Maybe you can make a quick start on # 2? Maybe it will give rise to test failures that Aria can fix??

ccshan commented 1 year ago

Compared to the :heavy_check_mark: changes on this identity branch, I'm a lot less confident about the changes I made in the multi_embedded branch to get the sum_product tests to even run. But on that branch, we are down to just one test failure now:

======================================================================
ERROR: test_autograd (test.test_sum_product.TestSumProduct.test_autograd) (example='test/simplefgg.json (p=1.0)')
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/ccshan/u/rational/fmitf/fggs/test/test_sum_product.py", line 108, in test_autograd
    self.assertTrue(torch.autograd.gradcheck(f, in_values, atol=1e-3))
                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/ccshan/u/rational/fmitf/python/lib/python3.11/site-packages/torch/autograd/gradcheck.py", line 1418, in gradcheck
    return _gradcheck_helper(**args)
           ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/ccshan/u/rational/fmitf/python/lib/python3.11/site-packages/torch/autograd/gradcheck.py", line 1432, in _gradcheck_helper
    _gradcheck_real_imag(gradcheck_fn, func, func_out, tupled_inputs, outputs, eps,
  File "/home/ccshan/u/rational/fmitf/python/lib/python3.11/site-packages/torch/autograd/gradcheck.py", line 1075, in _gradcheck_real_imag
    gradcheck_fn(func, func_out, tupled_inputs, outputs, eps,
  File "/home/ccshan/u/rational/fmitf/python/lib/python3.11/site-packages/torch/autograd/gradcheck.py", line 1131, in _slow_gradcheck
    raise GradcheckError(_get_notallclose_msg(a, n, i, j, complex_indices, test_imag))
torch.autograd.gradcheck.GradcheckError: Jacobian mismatch for output 0 with respect to input 1,
numerical:tensor([[0.]], dtype=torch.float64)
analytical:tensor([[1.]], dtype=torch.float64)

davidweichiang commented 1 year ago

On a larger scale, I'm still wondering about the relationship of Tensor, Semiring, and EmbeddedTensor. Do we have it right? The fact that Semiring uses Tensors seems backwards, but it had to be that way because of the difficulty of subclassing Tensor to use other semirings. OTOH, EmbeddedTensor already goes through the trouble of (sort of) subclassing Tensor, which makes me wonder whether the relationship between EmbeddedTensor and Semiring could be put right. That is, maybe an EmbeddedTensor could contain a Semiring and a Tensor, so that when you, e.g., add two EmbeddedTensors, it looks up the addition operation from the Semiring.

ccshan commented 1 year ago

Currently EmbeddedTensor(t) converts t:Tensor into an EmbeddedTensor, and et.to_dense() converts et:EmbeddedTensor into a Tensor. We can try to make these conversions more implicit, but what kind of "using FGGs but not PERPL" are we trying to ease? Currently, the sum_product and viterbi code only uses EmbeddedTensor internally (but in order to support product and sum types, I'm working to change that), so it's a good time to think about what the interface should be (should sum_product sometimes return an EmbeddedTensor?). It might make sense to make sum_product and viterbi operate on an abstract TensorLike type, like some Semiring methods do.

One reason to not put a Semiring inside (Embedded)Tensors is the binary method problem: what should t1+t2 do if t1 and t2 contain different Semirings?

If you could point out particular places in indices.py that are hard to understand, I'm sure there are plenty of comments to add...

davidweichiang commented 1 year ago

Since the word "embedding" is already overused in a ML context, are there any other names you can think of? StructuredTensor?

ccshan commented 1 year ago

Since the word "embedding" is already overused in a ML context, are there any other names you can think of? StructuredTensor?

How about "pattern"?

"embedding" -> "(sparsity) pattern"
"embedding var(iable)" -> "(sparsity-)pattern var(iable)"
"embedded tensor" -> "(sparsity-)patterned tensor"

davidweichiang commented 1 year ago

That works for me.

davidweichiang commented 1 year ago

Is it easy to explain why EmbeddingVar isn't just an int (0 for the first physical axis, 1 for the second physical axis, etc.)? And what Substs are used for?

ccshan commented 1 year ago

Is it easy to explain why EmbeddingVar isn't just an int (0 for the first physical axis, 1 for the second physical axis, etc.)?

Is it enough to say that we want to distinguish between the first physical axis of one PatternedTensor and the first physical axis of another PatternTensor?

And what Substs are used for?

Unification is first and foremost used to implement einsum. For example to einsum "ij,jk->ik" (in other words matrix multiplication), we unify the second virtual axis of the first PatternedTensor against the first virtual axis of the second PatternedTensor, and that might add some entry to the Subst that maps some physical axis of the first PatternedTensor to some product of physical axes of the second PatternedTensor.

ccshan commented 1 year ago

Hmm, why are my last 7 commits to the identity branch (ending with 89e838d) not shown in the pull request? Anyway, I think I have addressed all your comments.

diprism / fggs

Identity factors #163