edmBernard / mxnet_example_shared_weight

small examples to test shared layer
11 stars 2 forks source link

about not working #2

Closed Light-- closed 3 years ago

Light-- commented 3 years ago

excuse me, thanks for sharing, but what do you mean by not working in readme? the scripts with not working are wrong?

edmBernard commented 3 years ago

how an old project. I made this repository when I tried to implement shared weight for sequential training in Mxnet. Example: with not working in the readme mean the weight are not shared between training. btw, I advice you to use the Gluon API in MXNet. 3 years ago it was still new but now it the recommended way to implement network in MXNet and it works

Light-- commented 3 years ago

btw, I advice you to use the Gluon API in MXNet. 3 years ago it was still new but now it the recommended way to implement network in MXNet and it works

@edmBernard Thanks very much for your reply. i. May i ask is there any good mxnet example for siamese network with weight sharing? either symbol based or gluon based is okay. ii. Since not working means the weight are not shared, i assume the scripts without not working comment will works correctly (share weight between training). like: demo_shared_with_weight_copy_each_time.py.

But what i only see in the script is model1.fit, model2.fit, then model1.fit. Why doing this can share parameters during training? Only sharing 1 time after each model is trained, not sharing many times after every training batch is finished? sorry I got confused... could you help to explain...? thx.

edmBernard commented 3 years ago

for a siamese implementation it's in this file : demo_with_gluon_siamese I don't found "official" implementation of siamese network (At this time, I used them for image retrieval)

In some of my file I share weight in a different structure than siamese. Siamese use the same network for different step. In the demo_shared_with_weight_copy_each_time, I use 2 networks with a shared feature extractor. Each network have differents Dense layer at the end. Between training model1 and model2 I extract parameter from model1 with arg_param, aux_param = mlp_model1.get_params() and initialise model2 with them.

Light-- commented 3 years ago

Between training model1 and model2 I extract parameter from model1 with arg_param, aux_param = mlp_model1.get_params() and initialise model2 with them.

@edmBernard thanks for your quick reply! but by using your script (demo_shared_with_weight_copy_each_time), i can only do that (extract param for sharing) for 2 times, am i right? because you only called .fit( ) function 3 times, right?

mlp_model1.fit(train_iter,  # Line 49
...
arg_param, aux_param = mlp_model1.get_params() # sharing for the first time
mlp_model2.fit(train_iter,  # Line 63
...
arg_param, aux_param = mlp_model2.get_params() # sharing for the second time
mlp_model1.fit(train_iter,  # Line 77

your script is not able to share params between 2 models repeatedly (far more than 2 times)...

you extract and share the params to sencond model only after the first model's training is finished. you can not repeat it, unless you write many many lines like:

model1.fit
arg_param, aux_param = model1.get_params()
model2.fit
arg_param, aux_param = model2.get_params()
model1.fit
arg_param, aux_param = model1.get_params()
model2.fit
arg_param, aux_param = model2.get_params()
model1.fit
arg_param, aux_param = model1.get_params()
...

am i correct?

My understanding of params sharing in Siamese network:

model1.forward(batch)
model2.params = copy(model1.copy_params)
model2.forward(batch)
backward()

model1.forward(batch)
model2.params = copy(model1.copy_params)
model2.forward(batch)
backward()

model1.forward(batch)
model2.params = copy(model1.copy_params)
model2.forward(batch)
backward()
...

we share params during whole training process batch_wise repeatedly... please correct me if i were wrong...

edmBernard commented 3 years ago

yes with the method in the file demo_shared_with_weight_copy_each_time you have to call fit and copy param each time. (that's why I don't recommand to use it it's really slow compare to the Gluon way)

In fact in Siamese network you only have one network. For example in image retrieval, you feed different images (query, positve, negative) to the same network and compute a loss and backpropagate. In demo_shared_with_weight_copy_each_time it's not siamese network it's shared weight but for a slightly different purpose.