KoeAI / LLVC

MIT License
372 stars 31 forks source link

The effect of llvc is not as good as rvc #10

Closed Yaodada12 closed 5 months ago

pauortegariera commented 5 months ago

Hi, sorry for the inconvenience, i need desesperatly the checkpoints of a pretrained model this implementation because i want to do a finetuning. I need help on this please!

ksadov commented 5 months ago

@pauortegariera You can find our checkpoints here: https://huggingface.co/KoeAI/llvc_models/tree/main/models/checkpoints

ksadov commented 5 months ago

And to answer the original query: LLVC is optimized for low-latency inference on CPU. It operates under fundamentally different constraints than RVC, which is designed for medium-latency inference on GPU.

Yaodada12 commented 5 months ago

And to answer the original query: LLVC is optimized for low-latency inference on CPU. It operates under fundamentally different constraints than RVC, which is designed for medium-latency inference on GPU.

But the effect in your paper is relatively good. I am very confused.

ksadov commented 5 months ago

And to answer the original query: LLVC is optimized for low-latency inference on CPU. It operates under fundamentally different constraints than RVC, which is designed for medium-latency inference on GPU.

But the effect in your paper is relatively good. I am very confused.

In the paper, we picked a relatively "easy" target voice (clear audio, level in tone and volume) and made sure to use an RVC checkpoint that produces high-quality converted output. The input that you choose for our model may also differ from the input that we used as examples for our paper. You can get a sense for which voices work well as target voices by listening to the CPU samples provided here (note that the linked audio is at 22.5kHz and our retrained model is at 16kHz). You can also test download the checkpoint and run it on your own voice samples, or (if you have Windows) download the Koe desktop app and try the CPU voices there. Choice of mic and how close you are to it have a large impact on conversion quality.

Yaodada12 commented 5 months ago

And to answer the original query: LLVC is optimized for low-latency inference on CPU. It operates under fundamentally different constraints than RVC, which is designed for medium-latency inference on GPU.

But the effect in your paper is relatively good. I am very confused.

In the paper, we picked a relatively "easy" target voice (clear audio, level in tone and volume) and made sure to use an RVC checkpoint that produces high-quality converted output. The input that you choose for our model may also differ from the input that we used as examples for our paper. You can get a sense for which voices work well as target voices by listening to the CPU samples provided here (note that the linked audio is at 22.5kHz and our retrained model is at 16kHz). You can also test download the checkpoint and run it on your own voice samples, or (if you have Windows) download the Koe desktop app and try the CPU voices there. Choice of mic and how close you are to it have a large impact on conversion quality.

Thank‘s for your reply,I will try it. I have another question, how to make the student model support multi speakers,through label embedding of waveformer?