Open andreasgrv opened 1 week ago
Output for
sum_weight_params = None # This line leads to nan loss
circuit = Circuit.from_region_graph(rg,
input_factory=input_factory,
sum_weight_factory= sum_weight_params,
num_sum_units=NUM_SUM_UNITS,
sum_product='cp')
python example.py Step 100: Average NLL: nan Step 200: Average NLL: nan Step 300: Average NLL: nan
On the other hand, if:
sum_weight_init = name_to_initializer('normal')
sum_weight_params = name_to_parameter_factory('softmax', initializer=sum_weight_init)
circuit = Circuit.from_region_graph(rg,
input_factory=input_factory,
sum_weight_factory= sum_weight_params,
num_sum_units=NUM_SUM_UNITS,
sum_product='cp')
python example.py Step 100: Average NLL: 3422.423 Step 200: Average NLL: 1614.733 Step 300: Average NLL: 1013.035
This is due to sum weights being inited to Normal by default, but they are expected to be positive in "common" circuits, and negative values generate nan in log-sum-exp.
However we also have many projects using negative weights (with sum-product
or complex-lse-sum
semiring), so it makes sense to use Normal init.
This may be confusing for somebody not familiar with the internals of the library - is there a way to avoid this?
Considering this, I would agree to change the default init for sum.
But in any way, we should properly doc the default init for layers and tell the users when they should NOT rely on the default.
Code to reproduce:
In the above code when the sum weight parameterisation is not specified, the result is a loss of nan during training. This may be confusing for somebody not familiar with the internals of the library - is there a way to avoid this?