Closed power1628 closed 3 years ago
We use GraphSAGE as an example to show how to train in parallel on multiple machines. Currently we just provides asynchronous training example, you can simply replace it with synchronous training by using synchronous optimizer in TensorFlow.
Initialize optimizer with use_locking=True
flag in DistTFTrainer will perform synchronous training in distributed settings. Modify code here
Initialize optimizer with
use_locking=True
flag in DistTFTrainer will perform synchronous training in distributed settings. Modify code here
Are you sure?
The use_locking
only guarantees multi-thread safety, not distributed synchronous training.
BTW, I don't think this issue should be closed.
Initialize optimizer with
use_locking=True
flag in DistTFTrainer will perform synchronous training in distributed settings. Modify code hereAre you sure? The
use_locking
only guarantees multi-thread safety, not distributed synchronous training. BTW, I don't think this issue should be closed.
I misunderstood you. Basically, to use distributed synchronous training, one needs to:
A synchronous example will be posted soon.
This issue should be re-opened since it's not fixed.
The question is how to synchronize training using TensorFlow, the above discussion has given advice.
Great work! I wonder how graph-learn do synchronous training. It would be great if there's a distributed synchronous training example.