Error while using Multi-processing

PhantomGrapes commented 7 years ago

Hello, I have completed a project using dynet 1.1. According to xor-mp and rnnlm-mp, I've implemented multi-processing and everything seems alright. No errors reported, train loss goes down after every epoch and I can use the model saved to predict. But I found that the model saved is always the initial model. So I debug my project and step into run_parent in mp.h, I found that every changes made by child's process doesn't seems to appear in run_parent. For example, I put an integer i in learner as its attribute and initialize it to 0. Then I change its value to 7 in LearnFromDatum, and print its value in SaveModel. Then I debug my project, step into run_parent. After runDataset, the child process us LearnFromDatum and print train loss. Then I use "print SaveModel()" in gdb, it shows that the attribute i in learner is still 0(it should have been changed in LearnFromDatum to 7 while training). So I get readlly confused, it seems that the learn in run_parent is a different object in run_child. Can you give me some advices to solve it? The project is not simple so it will be hard to recode it in dynet 2.0. Thanks!

armatthews commented 7 years ago

Hello, PhantomGrapes!

These symptoms sound like dynet is not sharing parameters across cores. Can you please verify that you set shared_parameters = True when initializing dynet, as shown here?:

dynet::initialize(argc, argv, true);

neubig commented 7 years ago

It looks like this might have fixed the problem, but it's not documented in the multiprocessing doc, so I'll mark this as a minor bug.

clab / dynet

Error while using Multi-processing #817