dlsyscourse / hw2

5 stars 16 forks source link

nn.Linear.bias shape inconsistency #8

Open navalnica opened 1 year ago

navalnica commented 1 year ago

As noted in hw2.ipynb:

Be careful to explicitly broadcast the bias term to the correct shape -- Needle does not support implicit broadcasting.

And as noted in this Forum discusion, needle does not support implicit broadcasting because such broadcasts are not tracked in computational graph - this leads to wrong gradient computations in backward pass.

As a result we need to explicitly broadcast bias tensor in nn.Linear.forward().

I guess, there should be no restriction on what shape we use to store bias term after initialization - we will broadcast it anyway during forward pass. We can store bias term either in a 2D-tensor of shape (1, out_features) or in a 1D-tensor of shape (out_features, ).

However there is ambiguity in test_nn_and_optim.py.

I think we should allow bias to be any of valid shapes ((1, out_features) or (out_features, )) in test_nn_linear_bias_init_1() test