OlaWod / FreeVC

FreeVC: Towards High-Quality Text-Free One-Shot Voice Conversion
MIT License
561 stars 102 forks source link

好奇一个问题 #5

Open guoyingying432 opened 1 year ago

guoyingying432 commented 1 year ago

您好,拜读了论文,想了解一下,模型的参数数量和运行时间大概是多久呢?或者说转换一秒的语音在3090显卡上需要多久的运算时间呢。

OlaWod commented 1 year ago

Number of parameters: WavLM: 315.5 M Other parts of model: 40.8 M

Inference speed: On average it takes 0.05 seconds to generate 1 second speech. (tested with 4800 utterances for a total of 21336.72 seconds)

guoyingying432 commented 1 year ago

多谢!另外同学我想再咨询一个问题,在原版VITS(https://github.com/jaywalnut310/vits)中,在evaluate的过程中并没有计算在验证集上的loss,那该如何避免过拟合呢?如何选择最佳的模型呢

OlaWod commented 1 year ago

多谢!另外同学我想再咨询一个问题,在原版VITS(https://github.com/jaywalnut310/vits)中,在evaluate的过程中并没有计算在验证集上的loss,那该如何避免过拟合呢?如何选择最佳的模型呢?

The larger the dataset, the less likely it is to overfit. A simple solution to choose the best checkpoint is to synthesize waveforms with different checkpoints and test their metrics (say, objective smos, l2 distance between $y$ and $y_{hat}$, etc.) results. We can also log the metrics results of validation set during validation, just

def evaluate(...)
    ...
    y_hat, ... = generator.module.infer(...)
    foo = metrics1(y_hat, ...)
    bar = metrics2(y_hat, ...)
    ...
    scalar_dict.update({"metrics1": foo, "metrics2": bar})
    ...