feldberlin / wavenet

An unconditioned Wavenet implementation with fast generation.
3 stars 0 forks source link

Introduce DDP #17

Closed purzelrakete closed 3 years ago

purzelrakete commented 3 years ago

What

Use nn.DistributedDataParallel instead of nn.DataParallel.

Why

The immediate motivation is to use SynchBatchNorm, which is only available for DDP. However DDP is also the recommended data parallel method in pytorch. As it turns out, the machinery which has to be added is pretty minimal when using fork process.

Acceptance Criteria

purzelrakete commented 3 years ago

Experiments

bin/train maestro -p batch_size 12 -p batch_norm True -p learning_rate 0.05 -p max_epochs 12 -p sample_overlap_receptive_field True

Due to averaging of gradients across ddp workers, we have to be careful that we have not effectively halved the learning rate. But this should not be the case, since xent has an average reduction by default. Nevertheless, training with a doubled learning rate:

bin/train maestro -p batch_size 12 -p batch_norm True -p learning_rate 0.1 -p max_epochs 12 -p sample_overlap_receptive_field True