Introduce DDP - Githubissues

feldberlin / wavenet

An unconditioned Wavenet implementation with fast generation.

3 stars 0 forks source link

Experiments

bin/train maestro -p batch_size 12 -p batch_norm True -p learning_rate 0.05 -p max_epochs 12 -p sample_overlap_receptive_field True

Due to averaging of gradients across ddp workers, we have to be careful that we have not effectively halved the learning rate. But this should not be the case, since xent has an average reduction by default. Nevertheless, training with a doubled learning rate:

bin/train maestro -p batch_size 12 -p batch_norm True -p learning_rate 0.1 -p max_epochs 12 -p sample_overlap_receptive_field True

feldberlin / wavenet

Introduce DDP #17

What

Why

Acceptance Criteria

Experiments