april-tools / cirkit

a python framework to build, learn and reason about probabilistic circuits and tensor networks
https://cirkit-docs.readthedocs.io/en/latest/
GNU General Public License v3.0
80 stars 4 forks source link

Circuit sum layer outputs NaN when categorical layer has fixed parameters in LSE-sum semiring #319

Closed n28div closed 3 days ago

n28div commented 1 week ago

When using a fixed parametrization on categorical layers the output of MixingLayer is NaN for some combination of the inputs. The issue seems to be related to the sum layer itself and particularly on the sum parameters.

By fixing the sum parameters to be unitary and picking the LSE-sum semiring, the result is wrong when both variables have zero value, but it is ok when either one of them has a non-zero. Minimale code:

import numpy as np
import torch
import random

random.seed(0)
np.random.seed(0)
torch.manual_seed(0)
torch.cuda.manual_seed(0)

from cirkit.symbolic.layers import CategoricalLayer, MixingLayer, HadamardLayer
from cirkit.symbolic.parameters import Parameter, ConstantParameter
from cirkit.utils.scope import Scope
from cirkit.pipeline import PipelineContext
from cirkit.symbolic.circuit import Circuit

probs = lambda: Parameter.from_input(ConstantParameter(1, 1, 2, value=np.array([0.0, 1.0]).reshape(1, 1, -1)))

cl_1 = CategoricalLayer(Scope([0]), 1, 1, num_categories=2, probs=probs())
cl_2 = CategoricalLayer(Scope([1]), 1, 1, num_categories=2, probs=probs())

sum_layer = MixingLayer(1, 2)

ctx = PipelineContext(backend='torch', fold=False, optimize=False, semiring='lse-sum')

symbolic_circuit = Circuit(
    1, 
    [cl_1, cl_2, sum_layer],
    { sum_layer: [cl_1, cl_2] },
    [sum_layer]
)
circuit = ctx.compile(symbolic_circuit)

print(circuit(torch.tensor([0, 0]).reshape(1, 1, 2)))
# >>> tensor([[[nan]]], grad_fn=<TransposeBackward0>)

print(circuit(torch.tensor([0, 1]).reshape(1, 1, 2)))
# >>> tensor([[[nan]]], grad_fn=<TransposeBackward0>)

print(circuit(torch.tensor([1, 0]).reshape(1, 1, 2)))
# >>> tensor([[[0.4324]]], grad_fn=<TransposeBackward0>)

print(circuit(torch.tensor([1, 1]).reshape(1, 1, 2)))
# >>> tensor([[[0.2212]]], grad_fn=<TransposeBackward0>)

changing the random seed changes the results (e.g. with random seed 1 only the first evaluation is NaN) and the same happens when fixing the sum parameters by replacing

sum_layer = MixingLayer(1, 2)

with

sum_layer = MixingLayer(1, 2, weight_factory=lambda n: Parameter.from_input(ConstantParameter(*n, value=np.ones(n))))

I tracked down the error and it appears to be in cirkit.backend.torch.semiring.LSESumSemiring in the method apply_reduce: when both inputs are 0s the variable xs is (tensor([[[[-inf]], [[-inf]]]]),). In line 375 the subtraction between two -inf results in an undefined operation.