In the current implementation, mel inputs passed to GST module (targets) have shape [B, n_mel_channels, T_out] and are reshaped to [B, 1, T_out, n_mel_channels] by GST Reference Encoder. As a result, Reference Encoder works not with original spectrograms.
This PR fixes the shape of GST inputs and adds an additional inputs shape check in ReferenceEncoder.
In the current implementation, mel inputs passed to GST module (
targets
) have shape[B, n_mel_channels, T_out]
and are reshaped to[B, 1, T_out, n_mel_channels]
by GST Reference Encoder. As a result, Reference Encoder works not with original spectrograms.This PR fixes the shape of GST inputs and adds an additional inputs shape check in
ReferenceEncoder
.