Be careful to explicitly broadcast the bias term to the correct shape -- Needle does not support implicit broadcasting.
And as noted in this Forum discusion, needle does not support implicit broadcasting because such broadcasts are not tracked in computational graph - this leads to wrong gradient computations in backward pass.
As a result we need to explicitly broadcast bias tensor in nn.Linear.forward().
I guess, there should be no restriction on what shape we use to store bias term after initialization - we will broadcast it anyway during forward pass. We can store bias term either in a 2D-tensor of shape (1, out_features) or in a 1D-tensor of shape (out_features, ).
However there is ambiguity in test_nn_and_optim.py.
test_nn_linear_bias_init_1() asserts that bias is initialized as a 2D-tensor of shape (1, out_features)
but linear_forward() and linear_backward() functions (that are used in test_nn_linear_forward_* and test_nn_linear_backward_* tests accordingly) assign 1D-tensor of shape (out_features, ) to a bias: f.bias.data = get_tensor(lhs_shape[-1]) Question: By the way, why do we call get_tensor to assign a new value to a bias term? Can't we use f.bias value that was created during initialization?
I think we should allow bias to be any of valid shapes ((1, out_features) or (out_features, )) in test_nn_linear_bias_init_1() test
As noted in
hw2.ipynb
:And as noted in this Forum discusion, needle does not support implicit broadcasting because such broadcasts are not tracked in computational graph - this leads to wrong gradient computations in backward pass.
As a result we need to explicitly broadcast bias tensor in
nn.Linear.forward()
.I guess, there should be no restriction on what shape we use to store bias term after initialization - we will broadcast it anyway during forward pass. We can store bias term either in a 2D-tensor of shape
(1, out_features)
or in a 1D-tensor of shape(out_features, )
.However there is ambiguity in
test_nn_and_optim.py
.test_nn_linear_bias_init_1()
asserts that bias is initialized as a 2D-tensor of shape(1, out_features)
linear_forward()
andlinear_backward()
functions (that are used intest_nn_linear_forward_*
andtest_nn_linear_backward_*
tests accordingly) assign 1D-tensor of shape(out_features, )
to a bias:f.bias.data = get_tensor(lhs_shape[-1])
Question: By the way, why do we call
get_tensor
to assign a new value to a bias term? Can't we usef.bias
value that was created during initialization?I think we should allow bias to be any of valid shapes (
(1, out_features)
or(out_features, )
) intest_nn_linear_bias_init_1()
test