feldberlin / wavenet

An unconditioned Wavenet implementation with fast generation.
3 stars 0 forks source link

Understand fast sampling regression #3

Closed purzelrakete closed 3 years ago

purzelrakete commented 3 years ago

What

Fast sampling appears to have suffered a regression after embedded inputs were implemented. Verify and/or fix.

Why

Fast generation is the only way to generate audio, and it should be the same as simple sampling. I noticed when looking at the sinusoid generation that when training with embed_inputs=True, then the resulting samples were considerably messier when using fast generation, especially in the utils.decode_random regimen. Samples also look like they are less evenly distributed in phase space, and clump together around certain phases.

Acceptance Criteria

purzelrakete commented 3 years ago

There is indeed a regression in fast sampling when using embed_inputs=True. See this notebook to compare results between fast and simple sampling when embedding inputs. Re-running this notebook without input embedding yields substantially similar samples when comparing fast and simple.

purzelrakete commented 3 years ago

Resolved. This was due to 0 being a special padding value. Embedding at index 0 is now all zeros.