IAHispano / Applio

A simple, high-quality voice conversion tool focused on ease of use and performance.
https://applio.org
MIT License
1.62k stars 260 forks source link

[Feature]: Add support for 44.1khz audio #741

Open sharkeylaser opened 3 days ago

sharkeylaser commented 3 days ago

Description

I was wondering if it would be possible to add support for 44.1khz audio files without having to resample. It's also a common sample rate, so it might be good to have as an option. Thanks for your efforts!

Problem

Currently, 44.1khz audio options are not present in Applio. Given that it's a common sample rate for CDs, I would think it should be an option.

I have a ton of audiobooks on CDs that I would like to work with, hence the request to support 44.1khz audio. I would like to take the narrator of one audiobook and use their voice to replace the narrator's voice of another audiobook.

Proposed Solution

44.1khz audio should be supported without having to resample.

Alternatives Considered

n/a

blaisewf commented 3 days ago

we don’t plan to do that, 40khz and 44.1khz are mostly the same, if you want more quality you can go with 48khz

AznamirWoW commented 3 days ago

additionally there are no 44.1KHz pretrains

sharkeylaser commented 2 days ago

It seems that using 48khz causes the voice to become robotic, distorted, and choppy. 40khz sounds better, but sacrifices quality. It would be a real shame if there's truly no plans for such a common sample rate. What would it take to make a 44.1khz pretrain?

ShiromiyaG commented 2 days ago

It seems that using 48khz causes the voice to become robotic, distorted, and choppy. 40khz sounds better, but sacrifices quality. It would be a real shame if there's truly no plans for such a common sample rate. What would it take to make a 44.1khz pretrain?

Well, this sample rate isn't that common, the most common is 32k because of YouTube. As for making a pretrain, it would cost about 40 dollars, BUT the code for making pretrains is currently broken, so even if you had the money, it wouldn't work.

sharkeylaser commented 2 days ago

It seems that using 48khz causes the voice to become robotic, distorted, and choppy. 40khz sounds better, but sacrifices quality. It would be a real shame if there's truly no plans for such a common sample rate. What would it take to make a 44.1khz pretrain?

Well, this sample rate isn't that common, the most common is 32k because of YouTube. As for making a pretrain, it would cost about 40 dollars, BUT the code for making pretrains is currently broken, so even if you had the money, it wouldn't work.

It's a very common sample rate; every single CD that has ever been released has been 44.1khz, and CDs have existed decades before YouTube. There are billions of hours more of CDs than there are YouTube videos. Okay, why would it cost 40 dollars? And what code is it?

AznamirWoW commented 2 days ago

Okay, why would it cost 40 dollars? And what code is it?

He meant the cost of training a pretrained model from scratch at 44.1KHz on 4090 x 100-200 hours. The training from scratch unfortunately produces garbage results right now due to a noise introduced by the model.

sharkeylaser commented 2 days ago

Okay, why would it cost 40 dollars? And what code is it?

He meant the cost of training a pretrained model from scratch at 44.1KHz on 4090 x 100-200 hours. The training from scratch unfortunately produces garbage results right now due to a noise introduced by the model.

I have a 4080S; I'll do the work. Aside from the working code, what is needed to be done?

AznamirWoW commented 2 days ago

Okay, why would it cost 40 dollars? And what code is it?

He meant the cost of training a pretrained model from scratch at 44.1KHz on 4090 x 100-200 hours. The training from scratch unfortunately produces garbage results right now due to a noise introduced by the model.

I have a 4080S; I'll do the work. Aside from the working code, what is needed to be done?

you need a new config for v2/44100.json with appropriate settings you need to change config.py to include this file you need to change tabs/train/train.py to include the new option for sampling rate

then you need to figure out a big data set to train the pretrain on. It is unknow which data set was used for original RVC training or how it was actually made. Safe to say, multiple hours of varying audio.

train a new model without using pretrained (from scratch) until the model reaches an optimal state and no longer produces horizontal lines on the spectrogram.

blaisewf commented 2 days ago

Okay, why would it cost 40 dollars? And what code is it?

He meant the cost of training a pretrained model from scratch at 44.1KHz on 4090 x 100-200 hours. The training from scratch unfortunately produces garbage results right now due to a noise introduced by the model.

I have a 4080S; I'll do the work. Aside from the working code, what is needed to be done?

you need a new config for v2/44100.json with appropriate settings

you need to change config.py to include this file

you need to change tabs/train/train.py to include the new option for sampling rate

then you need to figure out a big data set to train the pretrain on. It is unknow which data set was used for original RVC training or how it was actually made. Safe to say, multiple hours of varying audio.

train a new model without using pretrained (from scratch) until the model reaches an optimal state and no longer produces horizontal lines on the spectrogram.

the base pretrained is vctk

sharkeylaser commented 1 day ago

Awesome, thanks for the info, @AznamirWoW and @blaisewf ! Once the issue with the code is resolved, I will get it done. Is there an issue open to track that problem, or an ETA for when/if the code will be fixed?

ShiromiyaG commented 1 day ago

Awesome, thanks for the info, @AznamirWoW and @blaisewf ! Once the issue with the code is resolved, I will get it done. Is there an issue open to track that problem, or an ETA for when/if the code will be fixed?

There's no ETA, since we don't know what's causing the problem, about open an issue, there are already two open https://github.com/fumiama/Retrieval-based-Voice-Conversion-WebUI/issues/87 https://github.com/RVC-Project/Retrieval-based-Voice-Conversion-WebUI/issues/2319

sharkeylaser commented 19 hours ago

Alright, I will keep checking in on those issues to see when it's resolved and proceed after. Thanks for the links, @ShiromiyaG !