d2l-ai / d2l-en

Interactive deep learning book with multi-framework code, math, and discussions. Adopted at 500 universities from 70 countries including Stanford, MIT, Harvard, and Cambridge.
https://D2L.ai
Other
23.24k stars 4.27k forks source link

adaptation inconsistency: chapter_deep-learning-computation/parameters #1223

Closed astonzhang closed 4 years ago

astonzhang commented 4 years ago

http://preview.d2l.ai.s3-website-us-west-2.amazonaws.com/d2l-en/master/chapter_deep-learning-computation/parameters.html

  1. mxnet:

print(net[0].collect_params())
print(net.collect_params())

dense0_ (
  Parameter dense0_weight (shape=(8, 4), dtype=float32)
  Parameter dense0_bias (shape=(8,), dtype=float32)
)
sequential0_ (
  Parameter dense0_weight (shape=(8, 4), dtype=float32)
  Parameter dense0_bias (shape=(8,), dtype=float32)
  Parameter dense1_weight (shape=(1, 8), dtype=float32)
  Parameter dense1_bias (shape=(1,), dtype=float32)
)

pytorch

print(net[1].state_dict())
print(net.state_dict())

OrderedDict()
OrderedDict([('0.weight', tensor([[ 0.2233,  0.1815, -0.1880,  0.1780],
        [ 0.1493, -0.4033, -0.3357, -0.1170],
        [-0.4171, -0.2477, -0.4834, -0.2077],
        [-0.4015,  0.2357,  0.1285,  0.4564],
        [-0.4385, -0.2682, -0.0510, -0.2132],
        [-0.1044,  0.4734,  0.1390,  0.2341],
        [-0.2781,  0.2203,  0.4285, -0.4425],
        [ 0.1697,  0.0497,  0.0042, -0.2616]])), ('0.bias', tensor([ 0.3599, -0.4421, -0.1519,  0.1739, -0.2889,  0.1194,  0.4794,  0.4822])), ('2.weight', tensor([[ 0.1173,  0.3268,  0.3000, -0.2517, -0.2242,  0.0704, -0.1405, -0.3193]])), ('2.bias', tensor([-0.1843]))])

For instance, net[1].state_dict() returns an empty OrderedDict(), which is inconsistent with the mxnet output

mxnet:

class MyInit(init.Initializer):
    def _init_weight(self, name, data):
        print('Init', name, data.shape)
        data[:] = np.random.uniform(-10, 10, data.shape)
        data *= np.abs(data) >= 5

net.initialize(MyInit(), force_reinit=True)
net[0].weight.data()[0:2]

Init dense0_weight (8, 4)
Init dense1_weight (1, 8)
array([[ 0.       , -0.       , -0.       ,  8.522827 ],
       [ 0.       , -8.828651 , -0.       , -5.6012006]])

pt:

def my_init(m):
    if type(m) == nn.Linear:
        nn.init.uniform_(m.weight, -10, 10)
        m.weight.data *= m.weight.data.abs() >= 5

net.apply(my_init)
net[0].weight[0:2]

tensor([[ 7.4014, -8.7963,  0.0000, -6.2305],
        [-6.9865,  0.0000, -0.0000, -0.0000]], grad_fn=<SliceBackward>)

Can we do print('Init', name, data.shape) in pt?

AnirudhDagar commented 4 years ago

@astonzhang Consistency fix in #1235