Closed lelayf closed 7 years ago
Hey! the default parameters are setup to predict every sample, which is probably not what you want with a big dataset. It seems like it's better to just take random fragments from your training set during training. I recommend making your own named_config with the following settings to start training quickly:
@ex.named_config
def custom_config(desired_sample_rate):
nb_filters = 32
dilation_depth = 7
nb_stacks = 4
fragment_stride = int(desired_sample_rate/10)
random_train_batches = True
fragment_length = 32 + (compute_receptive_field_(desired_sample_rate, dilation_depth, nb_stacks)[0])
train_only_in_receptive_field = True
You're then effectively training on 16 * 32
samples per batch (16 fragments in the batch with 32 samples inside the receptive field), and you're having seconds_in_dataset * 10 / 16
batches per epoch (as governed by the fragment_stride
).
Which you can then call with: python wavenet.py with custom_config
Good luck!
Hi ! I'm training a reasonably small network with 111888 parameters (2 stacks, dilation depth 10 and all other settings left untouched) on a single speaker dataset (77MB of 16 bit 44.1kHz WAV files = 32784 batches of 16). The ETA for one epoch is 42682s, so almost 12 hours. I'm using a Tesla K80, is that training time in line with your experience ? I know for sure I am processing 100K samples in half a second with a different implementation (based on TensorFlow). When we say batch size is 16, are these 16 fragments of 4223 samples = 67528 samples ? If that's the case then I am processing 67K samples in about 30 seconds. Also is the stride a way to say our fragments should overlap by 128 samples when the input is split ? cheers