endink / Mediapipe4u-plugin

379 stars 52 forks source link

w-okada voice-changer #102

Closed oivio closed 1 year ago

oivio commented 1 year ago

Comparison Product

w-okada voice-changer

Feature Category

Enhancement

Feature Description

I have not idea if that is at all possible but there is no harm to ask.

There is amazing free voice changer made by w-okada. It allows you to use RVC models with great results. It comes with versions for CUDA and without it.

https://github.com/w-okada/voice-changer

So I wonder if it is possible to create plugin that would work similar way and support RVC models? it would be definitely amazing addition to current TTS.

Thank you for your amazing work!

endink commented 1 year ago

Oh~~ Thank you for the advice!

I took a rough look and this project, it seems to be a unified inference engine for other projects.

  1. The easiest way is to use its server program, and then write a client in UE, so that we will depend on its server, but it is easy to get its features.This isn't suitable for M4U integration (M4U will keep 0 dependency), but this may be suit for specific projects (if you don't mind that your app depend on voice-changer server-side programs)

  2. Rewriting inference engine in C++ without using any of its code at all, which is probably a lot of work, as I see voice-changer integrates 3 models:

I don't know which of the three models above is the best,if you tell me, I will dig deeper.

oivio commented 1 year ago

Does 3 or I think even it was 4 models are selectable in that voice-changer

I can tell RVC(Retrieval-based-Voice-Conversion) currently is the best choice. Since communities around it are really big and constantly they do train new voice models. One of big Discord Channels is "AI HUB" easy to find. It has around 400k members and does members do upload free RVC voice models.

But it self RVC-WebUI it self is just a tool that let you to train voice model and convert voice audio file to use that trained voice.

That voice-changer somehow do voice changing in runtime with just minimal delay and I think that is tricky part.

Anyways, Thank you for quick response and again for amazing work!

endink commented 1 year ago

I looked at Retrieval-based-Voice-Conversion-WebUI, which seems to be a training framework, I see that it supports exporting onnx models, which is convenient for UE, I think I'll write some C++ to experiment with it when I have time, that will let me understand its latency and so on, which will let me know whether it fits into UE, and now I have a feature that integrates Chinese LLM model, so RVC integration will not be started too quickly, but if it's onnx support and low latency , this feature will be put into M4U road map.

endink commented 1 year ago

BTW, M4U will avoid using CUDA as much as possible, because of complex compatibility problems, and will abandon AMD users, I will use some graphics vendor-neutral technologies as much as possible, such as DML.

For example, M4U's whisper model inference uses the GPU, but it is fair to both AMD and Nvidia users.

Thank you for the feature request !

github-actions[bot] commented 1 year ago

This issue is stale because it has been open for 15 days with no activity.