Multispeaker support and some changes

IAHispano / Applio

A simple, high-quality voice conversion tool focused on ease of use and performance

https://applio.org

MIT License

1.82k stars 293 forks source link

Multispeaker support and some changes #729

Closed ShiromiyaG closed 2 months ago

ShiromiyaG commented 2 months ago

Description

This PR adds support for training multispeaker models and some changes

Motivation and Context

How has this been tested?

Screenshots (if appropriate):

Types of changes

[ ] Bug fix (non-breaking change which fixes an issue)
[X] New feature (non-breaking change which adds functionality)
[ ] Breaking change (fix or feature that would cause existing functionality to not work as expected)

Checklist:

[X] My code follows the code style of this project.
[ ] My change requires a change to the documentation.
[ ] I have updated the documentation accordingly.
[ ] I have added tests to cover my changes.
[X] All new and existing tests passed.

blaisewf commented 2 months ago

inference works well?

ShiromiyaG commented 2 months ago

@blaisewf Yes

blaisewf commented 2 months ago

ok, core.py is missing things

ShiromiyaG commented 2 months ago

Ow, okay, I'll add

ShiromiyaG commented 2 months ago

@blaisewf I think I've added everything that was missing

blaisewf commented 2 months ago

extract or preprocess doesn't need something on the core?

ShiromiyaG commented 2 months ago

extract or preprocess doesn't need something on the core?

blaisewf commented 2 months ago

extract or preprocess doesn't need something on the core?

No

so how do you preprocess a multispeaker dataset

AznamirWoW commented 2 months ago

extract or preprocess doesn't need something on the core?

No

so how do you preprocess a multispeaker dataset

user provides a location of the training set, such as C:\training all acceptable files in C:\training are used for sid 0 as compatibility all child folders with numeric values (we actually need to check that they are int) are used as speaker IDs and content of those folders is processed normally. the file index is written with these SID values + 2 mute files per speaker.

blaisewf commented 2 months ago

well so this needs to be well explained on the docs