Open guoyingying432 opened 1 year ago
Number of parameters: WavLM: 315.5 M Other parts of model: 40.8 M
Inference speed: On average it takes 0.05 seconds to generate 1 second speech. (tested with 4800 utterances for a total of 21336.72 seconds)
多谢!另外同学我想再咨询一个问题,在原版VITS(https://github.com/jaywalnut310/vits)中,在evaluate的过程中并没有计算在验证集上的loss,那该如何避免过拟合呢?如何选择最佳的模型呢?
多谢!另外同学我想再咨询一个问题,在原版VITS(https://github.com/jaywalnut310/vits)中,在evaluate的过程中并没有计算在验证集上的loss,那该如何避免过拟合呢?如何选择最佳的模型呢?
The larger the dataset, the less likely it is to overfit. A simple solution to choose the best checkpoint is to synthesize waveforms with different checkpoints and test their metrics (say, objective smos, l2 distance between $y$ and $y_{hat}$, etc.) results. We can also log the metrics results of validation set during validation, just
def evaluate(...)
...
y_hat, ... = generator.module.infer(...)
foo = metrics1(y_hat, ...)
bar = metrics2(y_hat, ...)
...
scalar_dict.update({"metrics1": foo, "metrics2": bar})
...
您好,拜读了论文,想了解一下,模型的参数数量和运行时间大概是多久呢?或者说转换一秒的语音在3090显卡上需要多久的运算时间呢。