facebookresearch / vocoder-benchmark

A repository for benchmarking neural vocoders by their quality and speed.
Other
199 stars 27 forks source link

feat: waveflow model #9

Open yoyololicon opened 2 years ago

yoyololicon commented 2 years ago

Thanks for making this benchmarking framework. I see that currently there's no flow-based vocoder model. It would be interesting to see how diffusion-based models compare to flow-based models cuz they're kind of similar, so I add an implementation of waveflow. The implementation is copied from my constant memory waveglow repo.

ebadawy commented 2 years ago

@yoyololicon , Thank you for your contribution.. will have a look on the code.. in the meantime have you got to train the model and get the benchmark numbers for it?

yoyololicon commented 2 years ago

@yoyololicon , Thank you for your contribution.. will have a look on the code.. in the meantime have you got to train the model and get the benchmark numbers for it?

@ebadawy not yet. I have just started training the model on LJSpeech and it needs about two weeks on my machine. I can report the numbers after it finishes.

yoyololicon commented 2 years ago

@ebadawy I have finished training in 9 days on a single RTX 3060. The benchmark numbers I got on a RTX 3070, using the checkpoint with the lowest validation loss:

{
    "model": "waveflow",
    "checkopint": "run_2/checkpoints/00950000.ckpt",
    "dataset": "/home/ycy/data-disk/Datasets/LJ/",
    "num_samples": 20,
    "metrics": {
        "ssim": "0.885",
        "mse": "0.002",
        "psnr": "27.515",
        "flops": "138609984000.000",
        "n_params": "5891584.000",
        "rtf": "0.226"
    }
}

In the paper, they actually trained for more than 2M steps, but I saw the model converged pretty well in the first 1M steps, and I would expect little improvements with further training. Generated samples can be downloaded here.

ebadawy commented 2 years ago

It looks good to me. @jasonwFB can you have a look on this one? not sure if there is an easy way to fix the import paths and the code style to match with the internal repo?

yoyololicon commented 2 years ago

@ebadawy Thanks for reviewing the code! I added some updates to the code for faster inference (RTF went from 0.226 to 0.212).