apache / mxnet

Lightweight, Portable, Flexible Distributed/Mobile Deep Learning with Dynamic, Mutation-aware Dataflow Dep Scheduler; for Python, R, Julia, Scala, Go, Javascript and more
https://mxnet.apache.org
Apache License 2.0
20.77k stars 6.8k forks source link

[Gluon] Gluon load_parameters/save_parameters deduplicate by default #18116

Open sxjscience opened 4 years ago

sxjscience commented 4 years ago

I find that the load/save logic in Gluon does not respect the prefix in the network.

Consider the following example, I created two networks, Foo and Foo2, where they both have one dense layer with prefix='layer_' but with different attribute names. One is called self.l1 and the other is called self.l2. At first glance, because these two layers share the same prefix, we can share the parameters, i.e., directly load the parameters from foo to foo2.

However, the following code will trigger an error:

import mxnet as mx
from mxnet.gluon import HybridBlock, nn
import tempfile
import os
mx.npx.set_np()

class Foo(HybridBlock):
    def __init__(self, prefix=None, params=None):
        super().__init__(prefix=prefix, params=params)
        with self.name_scope():
            self.l1 = nn.Dense(16, prefix='layer_')

    def hybrid_forward(self, F, x):
        return self.l1(x)

class Foo2(HybridBlock):
    def __init__(self, prefix=None, params=None):
        super().__init__(prefix=prefix, params=params)
        with self.name_scope():
            self.l2 = nn.Dense(16, prefix='net_')

    def hybrid_forward(self, F, x):
        return self.l2(x)

foo = Foo()
foo.initialize()
foo(mx.np.ones((32, 6)))
foo2 = Foo2()
with tempfile.TemporaryDirectory() as dir_path:
    foo.save_parameters(os.path.join(dir_path, 'test.params'))
    foo2.load_parameters(os.path.join(dir_path, 'test.params'))

Error message:

AssertionError: Parameter 'l2.weight' is missing in file '/tmp/tmpkf_n3w3s/test.params', which contains parameters: 'l1.weight', 'l1.bias'. Set allow_missing=True to ignore missing parameters.

Thus, Gluon is using the attribute name for sharing the parameters.

To understand the problem, let's consider the following example, in which we create network that has 4 shared dense layers. When we call save_parameters, the saved parameters should ideally only contain a single copy of the weights. However, it know contains 4 copies of the weights. This is not acceptable in the deployment setting in which we will have hard constraint on the size of the artifact.

import mxnet as mx
from mxnet.gluon import HybridBlock, nn
import tempfile
import os
mx.npx.set_np()

class Foo(HybridBlock):
    def __init__(self, prefix=None, params=None):
        super().__init__(prefix=prefix, params=params)
        with self.name_scope():
            self.l1 = nn.Dense(2048, prefix='layer_')
            self.l2 = nn.Dense(2048, params=self.l1.collect_params())
            self.l3 = nn.Dense(2048, params=self.l1.collect_params())
            self.l4 = nn.Dense(2048, params=self.l1.collect_params())

    def hybrid_forward(self, F, x):
        return self.l4(self.l3(self.l2(self.l1(x))))

class Foo2(HybridBlock):
    def __init__(self, prefix=None, params=None):
        super().__init__(prefix=prefix, params=params)
        with self.name_scope():
            self.l1 = nn.Dense(2048, prefix='layer_')

    def hybrid_forward(self, F, x):
        return self.l1(x)

foo = Foo()
foo.initialize()
foo(mx.np.ones((32, 2048)))
foo2 = Foo2(params=foo.collect_params())

with tempfile.TemporaryDirectory() as dir_path:
    foo.save_parameters(os.path.join(dir_path, 'foo1_save_parameters.params'))
    foo2.save_parameters(os.path.join(dir_path, 'foo2_save_parameters.params'))
    print('Keys by collect_params():', foo.collect_params().keys())
    print('Keys by loading the shared parameters:', mx.npx.load(os.path.join(dir_path, 'foo1_save_parameters.params')).keys())    
    print('Four shared layer artifact size:', os.stat(os.path.join(dir_path, 'foo1_save_parameters.params')).st_size)
    print('One layer artifact size:', os.stat(os.path.join(dir_path, 'foo2_save_parameters.params')).st_size)

Output as follows. We can see that the size of foo.save_parameters() will be 4 times the size of foo2.save_parameters(). However, these two should be the same.

Keys by collect_params(): odict_keys(['foo3_layer_weight', 'foo3_layer_bias'])
Keys by loading the shared parameters: dict_keys(['l1.weight', 'l1.bias', 'l2.weight', 'l2.bias', 'l3.weight', 'l3.bias', 'l4.weight', 'l4.bias'])
Four shared layer artifact size: 67142080
One layer artifact size: 16785544
leezu commented 4 years ago

Fixed by https://github.com/apache/incubator-mxnet/commit/8c44af4eba798b379c374c15582f9aea7dd7d8fd ?

Need to use deduplicate=True

We should make this default in MXNet 2

sxjscience commented 4 years ago

Confirmed that using save_parameters(..., deduplicate=True) will solve this problem.

leezu commented 4 years ago

Let's track changing the default for MXNet 2 in this issue

sxjscience commented 4 years ago

@leezu Should we submit a PR to change the default behavior? I think we should fix it as early as possible because we rely on save_parameters() to generate the model zoos.