Closed purzelrakete closed 3 years ago
There is indeed a regression in fast sampling when using embed_inputs=True
. See this notebook to compare results between fast
and simple
sampling when embedding inputs. Re-running this notebook without input embedding yields substantially similar samples when comparing fast
and simple
.
Resolved. This was due to 0 being a special padding value. Embedding at index 0 is now all zeros.
What
Fast sampling appears to have suffered a regression after embedded inputs were implemented. Verify and/or fix.
Why
Fast generation is the only way to generate audio, and it should be the same as simple sampling. I noticed when looking at the sinusoid generation that when training with
embed_inputs=True
, then the resulting samples were considerably messier when using fast generation, especially in theutils.decode_random
regimen. Samples also look like they are less evenly distributed in phase space, and clump together around certain phases.Acceptance Criteria
fast
vssimple
.embed_inputs=True
.