adelacvg / NS2VC

Unofficial implementation of NaturalSpeech2 for Voice Conversion and Text to Speech
232 stars 12 forks source link

Organizing Audio Data for Speaker Recognition: Folders or Single Folder? #31

Open lpscr opened 1 year ago

lpscr commented 1 year ago

first thank you very must for your work it's amazing !

I have 10 speakers in my dataset. Should I organize each speaker's audio files into separate folders like this

├── spk1
│   ├── 1.wav
│   ├── 2.wav
│   └── ...
├── spk2
│   ├── 3.wav
│   ├── 4.wav
│   └── ...
└── spk3
    ├── 5.wav
    ├── 6.wav
    └── 

Or is it okay to put all the audio files in one folder like this:

> ├── 1.wav
> ├── 2.wav
> ├── 3.wav
> ├── 4.wav
> ├── 5.wav
> ├── 6.wav
> └── 

Does separating each speaker into a folder help with better recognition during training? Also, I only have about 1 hour of audio data for each of the 10 speakers. Is that enough for effective training?"