basveeling / wavenet

Keras WaveNet implementation
https://soundcloud.com/basveeling/wavenet-sample
1.06k stars 219 forks source link

Performance benchmarks #14

Closed lelayf closed 7 years ago

lelayf commented 7 years ago

Hi ! I'm training a reasonably small network with 111888 parameters (2 stacks, dilation depth 10 and all other settings left untouched) on a single speaker dataset (77MB of 16 bit 44.1kHz WAV files = 32784 batches of 16). The ETA for one epoch is 42682s, so almost 12 hours. I'm using a Tesla K80, is that training time in line with your experience ? I know for sure I am processing 100K samples in half a second with a different implementation (based on TensorFlow). When we say batch size is 16, are these 16 fragments of 4223 samples = 67528 samples ? If that's the case then I am processing 67K samples in about 30 seconds. Also is the stride a way to say our fragments should overlap by 128 samples when the input is split ? cheers

basveeling commented 7 years ago

Hey! the default parameters are setup to predict every sample, which is probably not what you want with a big dataset. It seems like it's better to just take random fragments from your training set during training. I recommend making your own named_config with the following settings to start training quickly:

@ex.named_config
def custom_config(desired_sample_rate):
    nb_filters = 32
    dilation_depth = 7
    nb_stacks = 4
    fragment_stride = int(desired_sample_rate/10)
    random_train_batches = True
    fragment_length = 32 + (compute_receptive_field_(desired_sample_rate, dilation_depth, nb_stacks)[0])
    train_only_in_receptive_field = True

You're then effectively training on 16 * 32 samples per batch (16 fragments in the batch with 32 samples inside the receptive field), and you're having seconds_in_dataset * 10 / 16 batches per epoch (as governed by the fragment_stride).

Which you can then call with: python wavenet.py with custom_config

Good luck!