Encodec low-latency streamable setup configuration

In the Encodec paper, non-streamable and streamable inference setups are described:

Streamable. For the streamable setup, all padding is put before the first time step. For a transposed convolution with stride s, we output the s first time steps, and keep the remaining s steps in memory for completion when the next frame is available, or discarding it at the end of a stream. Thanks to this padding scheme, the model can output 320 samples (13 ms) as soon as the first 320 samples (13 ms) are received. We replace the layer normalization with statistics computed over the time dimension with weight normalization (Salimans & Kingma, 2016), as the former is ill-suited for a streaming setup ...

To the best of my knowledge, by default, Encodec in this repo operates in the non-streamable setup with overlapping chunks. It's not clear to me (a) if the streamable setup is implemented and (b) how to implement the streamable setup without audible discontinuities between frames.

Is the streamable model available?

facebookresearch / audiocraft

Encodec low-latency streamable setup configuration #411