好奇一个问题 - Githubissues

guoyingying432 commented 1 year ago

您好，拜读了论文，想了解一下，模型的参数数量和运行时间大概是多久呢？或者说转换一秒的语音在3090显卡上需要多久的运算时间呢。

OlaWod commented 1 year ago

Number of parameters: WavLM: 315.5 M Other parts of model: 40.8 M

Inference speed: On average it takes 0.05 seconds to generate 1 second speech. (tested with 4800 utterances for a total of 21336.72 seconds)

guoyingying432 commented 1 year ago

多谢！另外同学我想再咨询一个问题，在原版VITS（https://github.com/jaywalnut310/vits）中，在evaluate的过程中并没有计算在验证集上的loss，那该如何避免过拟合呢？如何选择最佳的模型呢？

OlaWod commented 1 year ago

多谢！另外同学我想再咨询一个问题，在原版VITS（https://github.com/jaywalnut310/vits）中，在evaluate的过程中并没有计算在验证集上的loss，那该如何避免过拟合呢？如何选择最佳的模型呢？

The larger the dataset, the less likely it is to overfit. A simple solution to choose the best checkpoint is to synthesize waveforms with different checkpoints and test their metrics (say, objective smos, l2 distance between $y$ and $y_{hat}$, etc.) results. We can also log the metrics results of validation set during validation, just

def evaluate(...)
    ...
    y_hat, ... = generator.module.infer(...)
    foo = metrics1(y_hat, ...)
    bar = metrics2(y_hat, ...)
    ...
    scalar_dict.update({"metrics1": foo, "metrics2": bar})
    ...

OlaWod / FreeVC

好奇一个问题 #5