Understand fast sampling regression

purzelrakete commented 3 years ago

What

Fast sampling appears to have suffered a regression after embedded inputs were implemented. Verify and/or fix.

Why

Fast generation is the only way to generate audio, and it should be the same as simple sampling. I noticed when looking at the sinusoid generation that when training with embed_inputs=True, then the resulting samples were considerably messier when using fast generation, especially in the utils.decode_random regimen. Samples also look like they are less evenly distributed in phase space, and clump together around certain phases.

Acceptance Criteria

[x] Verified the problem on the sinusoid training notebook by looking at fast vs simple.
[x] Verify that this problem does not exist when training sinusoids without embed_inputs=True.
[x] Reproduce the problem in a test
[x] Determine the cause and fix it

purzelrakete commented 3 years ago

There is indeed a regression in fast sampling when using embed_inputs=True. See this notebook to compare results between fast and simple sampling when embedding inputs. Re-running this notebook without input embedding yields substantially similar samples when comparing fast and simple.

purzelrakete commented 3 years ago

Resolved. This was due to 0 being a special padding value. Embedding at index 0 is now all zeros.

feldberlin / wavenet

Understand fast sampling regression #3

What

Why

Acceptance Criteria