Closed lagefreitas closed 6 years ago
The code doesn't train in parallel, it iterates over all workers sequentially:
[...] From worker 2: [2018-09-03 18:34:22 | info | Mocha]: TRAIN iter=000000 obj_val=0.04129305 From worker 2: [2018-09-03 18:34:22 | info | Mocha]: Saving snapshot to snapshot-000000.jld... From worker 2: [2018-09-03 18:34:22 | warn | Mocha]: Overwriting Snapshots/snapshots_2/snapshot-000000.jld... From worker 2: [2018-09-03 18:34:29 | info | Mocha]: TRAIN iter=001000 obj_val=0.00403221 From worker 2: [2018-09-03 18:34:29 | info | Mocha]: Saving snapshot to snapshot-001000.jld... [...] From worker 8: [2018-09-03 18:36:48 | info | Mocha]: Snapshot directory Snapshots/snapshots_8 already exists From worker 8: [2018-09-03 18:36:49 | info | Mocha]: TRAIN iter=000000 obj_val=0.37066936 From worker 8: [2018-09-03 18:36:49 | info | Mocha]: Saving snapshot to snapshot-000000.jld... [...]
We solved it by using Julia @async macro.
@async
The code doesn't train in parallel, it iterates over all workers sequentially: