Implement `inner` using BP

JoeyT1994 commented 3 months ago

This PR refactors the inner interface in ITensorNetworks.jl. Most importantly it adds algorithmic backends where "bp" and "exact" are currently supported.

For instance inner(x::AbstractITensorNetwork, A::AbstractITensorNetwork, y::AbstractITensorNetwork; alg = default_alg) and inner(x::AbstractITensorNetwork, y::AbstractITensorNetwork; alg = default_alg) will default to "bp" with a sensible partitioning if all the arguments are trees and "exact" otherwise.

The assumption in inner(...) is that the externalinds of the arguments are all aligned and thus the networks stack properly under disjoint_union and don't need mapping: although an optional kwarg link index map (which is defaulted to sim) is provided for x to avoid clashes. This is consistent with the interface for ITensors.jl at the moment.

Logarithmic versions of inner, i.e. loginner() are also provided. In the "bp" case this is neatly defined and helps to avoid overflow/ underflow. In the "exact" case it just calls log(inner(...)) so perhaps it is a bit redundant.

Testing is added via test_inner.jl and other standard tests which rely on inner will now direct to this and all pass. Additionally, tests in test_treetensornetworks will pass to this version and not the version in src/treetensornetworksabstracttreetensornetwork.jl unless the tensor network is specifically wrapped in the TTN type.

A few points of note/ things to think about resolving:

If all the elements of a tensor network x are 0 which means that norm_sqr(x::AbstractITensorNetwork) = 0 then the BP method will return NaN as it will return the ratio 0/0. Hence this is an edge case that isn't well determined by the BP code. This is because the current BP backend computes inner() via a global method. A local method is possible to define for a tree to solve this (it is how Benedikt's code in src/treetensornetworksabstracttreetensornetwork.jl works), and we could then define alg = "bp_global" and alg = "bp_local_tree" to let the user choose algorithms.
Currently inner(alg::Algorithm"exact", ...) passes to the inner_network(...) constructor which I cleaned up a bit: inner_network now forwards to flatten_networks but takes the dag on the first argument and its keyword arguments default to no flattening or combining (which are useful kwargs for "exact"). inner(alg::Algorithm"bp",...), instead, internally builds the relevant ITensorNetwork via disjoint_union. This is because the flatten_networks builder results in confusing vertex names for more than two networks and thus it is difficult to parse when trying to define a good default_partitioning.

@mtfishman @b-kloss

mtfishman commented 3 months ago

Can't the BP code detect if the norm is zero at some point? Presumably the message tensors become zero?

mtfishman commented 3 months ago

Note that I was advocating for removing hascommoninds(::typeof(siteinds), A::AbstractITensorNetwork, B::AbstractITensorNetwork) and check_hascommoninds(::typeof(siteinds), A::AbstractITensorNetwork, B::AbstractITensorNetwork) here: https://github.com/mtfishman/ITensorNetworks.jl/pull/144#issuecomment-1983581282, in case that is relevant.

JoeyT1994 commented 3 months ago

Note that I was advocating for removing hascommoninds(::typeof(siteinds), A::AbstractITensorNetwork, B::AbstractITensorNetwork) and check_hascommoninds(::typeof(siteinds), A::AbstractITensorNetwork, B::AbstractITensorNetwork) here: #144 (comment), in case that is relevant.

Hmm yeah I'm not sure what the bug is here because I don't see it locally on my machine. I thought something had changed in ITensors.jl and so it was importing something that no longer exists. Let me try to find the issue.

JoeyT1994 commented 3 months ago

Can't the BP code detect if the norm is zero at some point? Presumably the message tensors become zero?

Yeah I wonder if the norm of the message tensors is zero then does that guarantee (assuming you didn't initialize the message tensors to 0) that the norm of the wave function is zero? I assume so? In that case we could have that catch in the inner(alg::Algorithm"bp", ...) code?

mtfishman commented 3 months ago

Can't the BP code detect if the norm is zero at some point? Presumably the message tensors become zero?

Yeah I wonder if the norm of the message tensors is zero then does that guarantee (assuming you didn't initialize the message tensors to 0) that the norm of the wave function is zero? I assume so? In that case we could have that catch in the inner(alg::Algorithm"bp", ...) code?

I suppose a message could end up accidentally being zero without the norm of the network being zero, however that seems unlikely and also seems like it would be a problem for BP anyway since then it would always remain zero and cause other messages to become zero during updates involving that message. I suppose a fix for that situation could be to randomize the message and see if it ends up back at zero or not. But my point is that this seems like something that we should try to handle at the level of the BP code, rather than in a higher level code like inner, since this could be a potential issue for many functions that use BP, not just inner.

mtfishman commented 3 months ago

I think probably for running BP on a norm network you can show that the only way a message can end up zero is if the network is zero or if it is initialized to zero, since each update is of the form of a CP-map.

mtfishman commented 3 months ago

Note that I was advocating for removing hascommoninds(::typeof(siteinds), A::AbstractITensorNetwork, B::AbstractITensorNetwork) and check_hascommoninds(::typeof(siteinds), A::AbstractITensorNetwork, B::AbstractITensorNetwork) here: #144 (comment), in case that is relevant.

Hmm yeah I'm not sure what the bug is here because I don't see it locally on my machine. I thought something had changed in ITensors.jl and so it was importing something that no longer exists. Let me try to find the issue.

It may be related to https://github.com/ITensor/ITensors.jl/pull/1356, we moved all of the MPS/MPO code into an ITensorMPS submodule of ITensors.jl, which changed the namespaces of some internal functions, which probably included check_hascommoninds (i.e. check_hascommoninds probably isn't available from ITensors, and instead must be accessed from ITensors.ITensorMPS).

But if we have to circumvent issues around check_hascommoninds anyway, I think we may as well just remove it since I don't think it is serving a good purpose in ITensorNetworks.

JoeyT1994 commented 3 months ago

Okay I can remove it, shall I remove the calls to check_hascommoninds in the treetensornetwork code then? Presumably it is just being used for some assertions

mtfishman commented 3 months ago

Okay I can remove it, shall I remove the calls to check_hascommoninds in the treetensornetwork code then? Presumably it is just being used for some assertions

Yeah, thanks! Can you also remove hascommoninds(::typeof(siteinds), A::AbstractITensorNetwork, B::AbstractITensorNetwork) as well?

JoeyT1994 commented 3 months ago

@mtfishman actually this seems to be happening with more than just check_hascommoninds due to the ITensors.jl changes, I can try to hunt them down

mtfishman commented 3 months ago

Seems like we may as well just remove src/treetensornetworks/solvers/deprecated entirely instead of trying to update it.

mtfishman commented 3 months ago

Also src/treetensornetworks/deprecated_opsum_to_ttn.jl.

mtfishman commented 3 months ago

@JoeyT1994 reconsidering this, I'm concerned this PR is going to get messy with all of these other changes. Do you mind if I make a PR updating to the latest ITensor version, removing deprecated code, removing check_hascommoninds, etc. and then you can merge that into this PR?

JoeyT1994 commented 3 months ago

I think probably for running BP on a norm network you can show that the only way a message can end up zero is if the network is zero or if it is initialized to zero, since each update is of the form of a CP-map.

Yes you are right that if it is a CP-map a message being zero directly implies on of those two things. Perhaps in a non CP-map there is a possibility the outgoing message ends up being zero by virtue of the incoming message being a zero eigenvector of the local tensor - but that seems a weird case and likely implies the tensor contracts to 0 too.

In principle if a message is 0 then the BP code should rapidly converge to 0 in a few sweeps of the code (as all the other messages end up zero) and it implies something about the tensornetwork. So it seems to me this isn't actually something that the BP code needs to catch for and is part of the algorithm. The exception is if we are trying to normalize the message tensors, in which case we should flag an error?

JoeyT1994 commented 3 months ago

@mtfishman I have updated the PR based on what we spoke about on Wednesday.

inner(x, A, y) passes to BiLinearForm(A, x, y) which does the mapping on x to bring it into the dual space.
logscalar() is now generic to any algorithm where the cache has scalar_factors(cache) defined on it
The new namespace convention is used and updated for some of the tests inner applies to.

Let me know your thoughts.

codecov-commenter commented 3 months ago

Codecov Report

Attention: Patch coverage is 69.73684% with 46 lines in your changes are missing coverage. Please review.

Project coverage is 82.32%. Comparing base (ae4ad2c) to head (0026623). Report is 1 commits behind head on main.

Files	Patch %	Lines
src/ITensorsExtensions/ITensorsExtensions.jl	0.00%	31 Missing :warning:
src/inner.jl	84.21%	6 Missing :warning:
src/contract.jl	84.00%	4 Missing :warning:
src/expect.jl	50.00%	3 Missing :warning:
src/abstractitensornetwork.jl	85.71%	1 Missing :warning:
src/formnetworks/quadraticformnetwork.jl	66.66%	1 Missing :warning:

:exclamation: Your organization needs to install the Codecov GitHub app to enable full functionality.

Additional details and impacted files

```diff @@ Coverage Diff @@ ## main #147 +/- ## ========================================== + Coverage 81.89% 82.32% +0.43% ========================================== Files 72 74 +2 Lines 3744 3769 +25 ========================================== + Hits 3066 3103 +37 + Misses 678 666 -12 ```

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

JoeyT1994 commented 2 months ago

@mtfishman let me know if you have any further comments or if you are happy to merge

mtfishman commented 2 months ago

I'll take another look through the PR.

mtfishman commented 2 months ago

@JoeyT1994 I merged #155 into main and then tried merging it into this branch and fixing conflicts but it seems like it broke some BP tests... I had to fix some issues in the tests in #155 because of changes to the normalization of certain tensor network constructors but I'm not sure if that is related.

JoeyT1994 commented 2 months ago

Okay no worries let me pull that merge to my local machine and try to resolve the test failures.

mtfishman commented 2 months ago

Thanks. That change was supposed to be pretty benign but perhaps the behavior of some of the constructors changed accidentally.

JoeyT1994 commented 2 months ago

Yeah the issue is that now scalar(alg::Algorithm"bp", tn::AbstractITensorNetwork, ...) seems to not be returning the correct answer any more. Can't quite figure out why yet.

JoeyT1994 commented 2 months ago

Okay actually I don't think its a BP issue: The inner_network constructor (which passes to BiLinearFormNetwork and inserts an identity operator in the middle) is doing something different now than what it was, i.e.

ψ = random_tensornetwork(s; link_space=χ)
contract(inner_network(ψ, ψ)) == contract(ψ ⊗ prime(dag(ψ); sites=[]))

Now gives false.

JoeyT1994 commented 2 months ago

I see where the issue is, its the delta_network(union_all_inds(siteinds(ket), dual_site_index_map(siteinds(ket)))) constructor, it appears to be inserting matrices like:

1 0
0 0

on the vertices instead of identity matrices.

mtfishman commented 2 months ago

I think that may have been an incorrect usage of delta_network, see my messages on Slack.

ITensor / ITensorNetworks.jl

Implement `inner` using BP #147

Codecov Report