IAHispano / Applio

A simple, high-quality voice conversion tool focused on ease of use and performance.
https://applio.org
MIT License
1.7k stars 273 forks source link

[Feature]: AutoPitch - automatic pitch detection #786

Open Bebra777228 opened 1 week ago

Bebra777228 commented 1 week ago

Description

Recently, a question regarding this has already been asked that, in my opinion, was not formulated quite accurately.

I would like to know if you plan to implement a function for automatic pitch detection, similar to how it was implemented in SVC.

I am not sure if this feature is still available in SVC, but about a year and a half ago, during inference, you could check the 'autopitch' box. This allowed the model to automatically adjust to the pitch of the voice in the source recording, resulting in a more realistic voice than with manual pitch adjustment (even though this feature worked poorly, the results were good). At least, that's how it seemed to me.

Problem

-

Proposed Solution

-

Alternatives Considered

-

aris-py commented 1 week ago

SVC is no longer used, it is too old, instead we use RVC which is much better and more up to date.

blaisewf commented 1 week ago

SVC is no longer used, it is too old, instead we use RVC which is much better and more up to date.

pero has leído?

aris-py commented 1 week ago

perdona, no se mucho ingles

kro-ai commented 1 week ago

SVC is no longer used, it is too old, instead we use RVC which is much better and more up to date.

Creo que piden la misma función en RVC. Estoy de acuerdo con ellos, ¡sería muy útil! entonces, al hacer inferencias, no es necesario ajustar los semitonos, lo hace por sí solo detectando el tono del audio de entrada.

Sorry for the bad spanish, this is auto-translated, I just wanted to make sure you understand.

For those who don't speak spanish, this feature would be very useful because you wouldn't have to adjust the pitch, ie -12 +12 semitones. It would automatically detect the pitch so if your model is a male voice and your input is a female voice, it would make the output pitch lower to compensate and make it sounds more natural. I used this feature all the time in SVC and found it very useful! Although It didn't work that great all the time, it was still useful to have. Not sure It would work all that great with singing audio though, but when using speaking audio it's really great.

AznamirWoW commented 1 week ago

For such functionality to work, there should be some kind of record of what the model was trained on detecting a max F0 value from inferred audio can be done, but adjusting the pitch down without knowing what the model is capable of, is not.

kro-ai commented 1 week ago

For such functionality to work, there should be some kind of record of what the model was trained on detecting a max F0 value from inferred audio can be done, but adjusting the pitch down without knowing what the model is capable of, is not.

Having looked into it a bit more, the way it works on SVC is that an f0 predictor is trained alongside the main model. Which explains why It wouldn't be possible with RVC. It is a shame as this would be very useful.

tomakorea commented 1 week ago

I also had really excellent experience with SVC Automatic pitch detection, it made the spoken voice really realistic, actually better than RVC or Applio where to be realistic, it's often necessary to do a lot of manual editing.

kro-ai commented 1 week ago

I also had really excellent experience with SVC Automatic pitch detection, it made the spoken voice really realistic, actually better than RVC or Applio where to be realistic, it's often necessary to do a lot of manual editing.

Me too, It was really useful for speaking audio. Maybe some brave soul can add this to Applio.

Chilluminati91 commented 1 week ago

Shouldnt this be pretty simple in general? Calculate a mean f0 from the training data. Then calculate mean f0 from your clip before inference and shift by the difference.

kro-ai commented 6 days ago

Shouldnt this be pretty simple in general? Calculate a mean f0 from the training data. Then calculate mean f0 from your clip before inference and shift by the difference.

Interesting.

blaisewf commented 1 day ago

@Bebra777228 could you share the code used on SVC to do that "AutoPitch" function?

Bebra777228 commented 1 day ago

I can't precisely determine which file this is implemented in, but I assume it might be the models.py file. This file contains the parameter use_automatic_f0_prediction, which might be what you need.

Overall, it's easiest to search through the code. You might find something useful if you use this search link.