IAHispano / Applio

A simple, high-quality voice conversion tool focused on ease of use and performance
https://applio.org
MIT License
1.81k stars 292 forks source link

added a fixed reference for the tensorboard audio section #870

Closed AznamirWoW closed 1 week ago

AznamirWoW commented 1 week ago

Description

Motivation and Context

How has this been tested?

Screenshots (if appropriate):

Types of changes

Checklist:

ItsMe-TJ commented 5 days ago

I am NOT a fan of this.

  1. Not sure If the source audio is female or male, but I assume male because when I was training a female voice, It was hard to tell how well It was training because the output was lower pitched than the actual voice I was training.

  2. Now this is a very specific complaint, and only affects a few (myself included) but I also train non-voice models.. Guitars, drums etc. RVC is really good at this, but having a preview audio be speech you're not training a voice model makes the preview completely useless.

Having a clip from the actual dataset (like it used to be) instead is WAY better because you get to hear exactly how your model is sounding + it also allows training for non-voice models.

I would really love to see this get reverted, or at least have it as a togglable option.

AznamirWoW commented 5 days ago

I am NOT a fan of this.

1. Not sure If the source audio is female or male, but I assume male because when I was training a female voice, It was hard to tell how well It was training because the output was lower pitched than the actual voice I was training.

2. Now this is a very specific complaint, and only affects a few (myself included) but I also train non-voice models.. Guitars, drums etc. RVC is really good at this, but having a preview audio be speech you're not training a voice model makes the preview completely useless.

Having a clip from the actual dataset (like it used to be) instead is WAY better because you get to hear exactly how your model is sounding + it also allows training for non-voice models.

I would really love to see this get reverted, or at least have it as a togglable option.

You can delete the reference folder and it will use the voice from the trained model. I'll think of some way to make your own reference meanwhile.

ItsMe-TJ commented 5 days ago

I am NOT a fan of this.

1. Not sure If the source audio is female or male, but I assume male because when I was training a female voice, It was hard to tell how well It was training because the output was lower pitched than the actual voice I was training.

2. Now this is a very specific complaint, and only affects a few (myself included) but I also train non-voice models.. Guitars, drums etc. RVC is really good at this, but having a preview audio be speech you're not training a voice model makes the preview completely useless.

Having a clip from the actual dataset (like it used to be) instead is WAY better because you get to hear exactly how your model is sounding + it also allows training for non-voice models. I would really love to see this get reverted, or at least have it as a togglable option.

You can delete the reference folder and it will use the voice from the trained model. I'll think of some way to make your own reference meanwhile.

Thank you! I think having a reference folder for a custom preview is a great idea, but maybe have it use the dataset by default, and add an option in the training tab, maybe you can drop an audio file in that you want to use as reference, and that file then gets used. Just an idea! That way If you want to use a custom preview, you can but you can also just leave it blank to use a file from the dataset!