adaptation inconsistency: chapter_deep-learning-computation/parameters

http://preview.d2l.ai.s3-website-us-west-2.amazonaws.com/d2l-en/master/chapter_deep-learning-computation/parameters.html

mxnet:

print(net[0].collect_params())
print(net.collect_params())

dense0_ (
  Parameter dense0_weight (shape=(8, 4), dtype=float32)
  Parameter dense0_bias (shape=(8,), dtype=float32)
)
sequential0_ (
  Parameter dense0_weight (shape=(8, 4), dtype=float32)
  Parameter dense0_bias (shape=(8,), dtype=float32)
  Parameter dense1_weight (shape=(1, 8), dtype=float32)
  Parameter dense1_bias (shape=(1,), dtype=float32)
)

tf:

print(net.layers[1].weights)
print(net.get_weights())

[<tf.Variable 'sequential/dense/kernel:0' shape=(4, 4) dtype=float32, numpy=
array([[ 0.5524189 ,  0.23129171,  0.0363729 , -0.8600636 ],
       [-0.69835407, -0.06596345,  0.01897395, -0.5417439 ],
       [ 0.54055935,  0.6689728 , -0.8319559 , -0.09743792],
       [-0.1610511 ,  0.49009317, -0.61211747, -0.45042837]],
      dtype=float32)>, <tf.Variable 'sequential/dense/bias:0' shape=(4,) dtype=float32, numpy=array([0., 0., 0., 0.], dtype=float32)>]
[array([[ 0.5524189 ,  0.23129171,  0.0363729 , -0.8600636 ],
       [-0.69835407, -0.06596345,  0.01897395, -0.5417439 ],
       [ 0.54055935,  0.6689728 , -0.8319559 , -0.09743792],
       [-0.1610511 ,  0.49009317, -0.61211747, -0.45042837]],
      dtype=float32), array([0., 0., 0., 0.], dtype=float32), array([[ 0.0090847 ],
       [-1.0178163 ],
       [-0.29936522],
       [-0.6696218 ]], dtype=float32), array([0.], dtype=float32)]

For instance, bias is missing.

mxnet:

net.collect_params()['dense1_bias'].data()

array([0.])

tf:

net.get_weights()[1]

array([0., 0., 0., 0.], dtype=float32)

The output does not look like bias.

mxnet:

class MyInit(init.Initializer):
    def _init_weight(self, name, data):
        print('Init', name, data.shape)
        data[:] = np.random.uniform(-10, 10, data.shape)
        data *= np.abs(data) >= 5

net.initialize(MyInit(), force_reinit=True)
net[0].weight.data()[0:2]

Init dense0_weight (8, 4)
Init dense1_weight (1, 8)
array([[ 0.       , -0.       , -0.       ,  8.522827 ],
       [ 0.       , -8.828651 , -0.       , -5.6012006]])

tf:

print(net.layers[1].weights[0])

<tf.Variable 'sequential_6/dense_13/kernel:0' shape=(4, 4) dtype=float32, numpy=
array([[0.02371812, 0.67190015, 0.40087283, 0.56996346],
       [0.42595625, 0.5223805 , 0.34758675, 0.5847038 ],
       [0.22081661, 0.97955835, 0.9585841 , 0.5245316 ],
       [0.59826577, 0.59225726, 0.25385475, 0.30986   ]], dtype=float32)>

a) print('Init', name, data.shape) is missing in tf b) net[0].weight should access [0:2] in tf

mxnet

net[0].weight.data()[0]

array([42.      ,  1.      ,  1.      ,  9.522827])

tf:

net.layers[1].weights[0]

<tf.Variable 'sequential_6/dense_13/kernel:0' shape=(4, 4) dtype=float32, numpy=
array([[42.       ,  1.6719002,  1.4008728,  1.5699635],
       [ 1.4259562,  1.5223805,  1.3475868,  1.5847038],
       [ 1.2208166,  1.9795583,  1.9585841,  1.5245316],
       [ 1.5982658,  1.5922573,  1.2538548,  1.30986  ]], dtype=float32)>

The outputs have different shapes.

d2l-ai / d2l-en

adaptation inconsistency: chapter_deep-learning-computation/parameters #1224