apache / mxnet

Lightweight, Portable, Flexible Distributed/Mobile Deep Learning with Dynamic, Mutation-aware Dataflow Dep Scheduler; for Python, R, Julia, Scala, Go, Javascript and more
https://mxnet.apache.org
Apache License 2.0
20.78k stars 6.79k forks source link

How do I make a siamese network with pretrained models (esp. keeping the weights the same?) #8591

Closed tz-hmc closed 6 years ago

tz-hmc commented 6 years ago

Description

How do I ensure the weights are kept the same? Can I unpack the internal layers somehow and set the weights of each to the same variable? My apologies, I'm new to MXNet. Would really appreciate the help, thanks!

sym1, arg_params1, aux_params1 = mx.model.load_checkpoint('resnet-152', 0)
sym2, arg_params2, aux_params2 = mx.model.load_checkpoint('resnet-152', 0)
layer1 = sym1.get_internals()
layer2 = sym2.get_internals()
for i in range(len(layer1)): # will something like this work?
    arg_params1[i] = arg_params2[i]

Relevant answers, but not specific enough to my particular problem: https://github.com/apache/incubator-mxnet/issues/772 siamese networks https://github.com/apache/incubator-mxnet/issues/6791 extract layers as variables https://github.com/apache/incubator-mxnet/issues/557 set weights to be same

edmBernard commented 6 years ago

I got a similar problem and achieve to got several solutions: https://github.com/apache/incubator-mxnet/issues/7530 With the Gluon API it's easy and straight forward, with module API that something else :( I put my test here: https://github.com/edmBernard/mxnet_example_shared_weight Readme describe if it's work or not

tz-hmc commented 6 years ago

Wow, I didn't know that API existed. I had a lot of trouble trying to make it work with the module API but the Gluon API looks super promising, thanks for sharing :)

However, though I'll definitely test Gluon out, do you know how I would do this with the Module API?

Can I extract the each layer's functionality somehow and set the weights to the same variable as its identical layer in the other network? If it's too big a hassle, I guess I would use Gluon, though all the other code I have uses the Module API.

edmBernard commented 6 years ago

If you have exactly the same network two time, it might be possible to use shared_module in bind function. it's use in RNN to duplicate network. I was not able to use it, as my two networks were not exactly the same. here In my opinion, It will be easier to switch to Gluon and you will be sure it will work. More you can use in Gluon your network define with symbol API. here (I don't have test it)

tz-hmc commented 6 years ago

Hey again,

I tried something like this, but I still have a lot of questions:

    sym1, arg_params, aux_params = get_model()
    sym2, arg_params, aux_params = get_model()

    mod1 = mx.mod.Module(symbol=sym1, context=mx.cpu(), label_names=None)
    mod2 = mx.mod.Module(symbol=sym2, context=mx.cpu(), label_names=None)
    mod1.bind(for_training=True, shared_module=mod2, data_shapes=[('data', (1,3,224,224))], # true to train
             label_shapes=mod1._label_shapes)
    mod2.bind(for_training=True, shared_module=mod1, data_shapes=[('data', (1,3,224,224))], # true to train
             label_shapes=mod2._label_shapes)
    mod1.set_params(arg_params, aux_params, allow_missing=True)
    mod2.set_params(arg_params, aux_params, allow_missing=True)

    out1 = sym1.get_internals()['flatten0_output']
    out2 = sym2.get_internals()['flatten0_output']
    siamese_out = mx.sym.Concat(out1, out2, dim=0)

    # Example stacked network after it
    fc1  = mx.symbol.FullyConnected(data = siamese_out, name='fc1', num_hidden=128)
    act1 = mx.symbol.Activation(data = fc1, name='relu1', act_type="relu")
    fc2  = mx.symbol.FullyConnected(data = act1, name = 'fc2', num_hidden = 64)
    act2 = mx.symbol.Activation(data = fc2, name='relu2', act_type="relu")
    fc3  = mx.symbol.FullyConnected(data = act2, name='fc3', num_hidden=num_classes)
    mlp  = mx.symbol.SoftmaxOutput(data = fc3, name = 'softmax')
    # new_args = dict()

    mod3 = mx.mod.Module(fc1, context=mx.cpu(), label_names=None)
    mod3 = fe_mod.bind(for_training=False, data_shapes=[('data', (1,3,224,224))])
    mod3.set_params(arg_params, aux_params)

I only want the first part of this network (layers attached to mod2 & mod1) to be shared. Would something like this work & still backpropagate errors appropriately when fitted?

Having to run mod.fit on each part of the network could be inconvenient. Is there a way around this?

edmBernard commented 6 years ago

I don't test shared_module in something similar to you application. (Are you sure you don't want to use Gluon ?) :)

I don't test your code but some corrections :

# you don't need `shared_module=mod2`
mod1.bind(for_training=True, shared_module=mod2, data_shapes=[('data', (1,3,224,224))], label_shapes=mod1._label_shapes)

If you want to train everythings as one network, you need to define a new Data Iterator that is able to pass two different image in you network.

Maybe it's easier to try this example of triplet loss network (I don't test if it work)

edmBernard commented 6 years ago

here an example using Gluon

tz-hmc commented 6 years ago

Wow. Thank you so much. Alright, this gives me a lot to think about. I'm really grateful for your help, thanks a ton.

aodhan-domhnaill commented 6 years ago

If you want to share weights across the network, why not just use one copy of the network and run it twice with the inputs?

final_net(nd.concat(shared_net(x), shared_net(x)))

Also, I definitely recommend using Gluon instead of pure MxNet

szha commented 6 years ago

@apache/mxnet-committers: This issue has been inactive for the past 90 days. It has no label and needs triage.

For general "how-to" questions, our user forum (and Chinese version) is a good place to get help.

nswamy commented 6 years ago

@tz-hmc, Hope your question has been answered. For general "how-to" questions, our user forum (and Chinese version) is a good place to get help.