EveryVoiceTTS / EveryVoice

The EveryVoice TTS Toolkit - Text To Speech for your language
Missing duration files for `extract-alignments` #576

Open SamuelLarkin opened 3 weeks ago

SamuelLarkin commented 3 weeks ago

Bug description

According the aligner's preprocess, running python -m everyvoice.model.aligner.DeepForcedAligner.dfaligner.cli preprocess config/everyvoice-aligner.yaml should generate all require files to do python -m everyvoice.model.aligner.DeepForcedAligner.dfaligner.cli extract-alignments config/everyvoice-aligner.yaml --no-predict. The duration files are not generated.

If they aren't generated because sox is not installed, a clear warning should be displayed.

How to reproduce the bug


python -m everyvoice.model.aligner.DeepForcedAligner.dfaligner.cli \
  preprocess \

Extract Alignments

python -m everyvoice.model.aligner.DeepForcedAligner.dfaligner.cli \
  extract-alignments \
  config/everyvoice-aligner.yaml \

Error messages and logs

python -m everyvoice.model.aligner.DeepForcedAligner.dfaligner.cli \
  extract-alignments \
  config/everyvoice-aligner.yaml \
2024-11-01 12:16:57.455 | INFO     | __main__:extract_alignments:144 - Loading modules for alignment...
  0%|                                                                                     | 0/5000 [00:00<?, ?it/s]
╭─────────────────────────────────────── Traceback (most recent call last) ───────────────────────────────────────╮
│ /fs/hestia_Hnrc/ict/sam037/git/EveryVoice/everyvoice/model/aligner/DeepForcedAligner/dfaligner/cli.py:211 in    │
│ extract_alignments                                                                                              │
│                                                                                                                 │
│   208 │   │   speaker = item["speaker"]                                                                         │
│   209 │   │   language = item["language"]                                                                       │
│   210 │   │   tokens = item["tokens"].cpu()                                                                     │
│ ❱ 211 │   │   pred = np.load(                                                                                   │
│   212 │   │   │   save_dir                                                                                      │
│   213 │   │   │   / "duration"                                                                                  │
│   214 │   │   │   / SEP.join([basename, speaker, language, "duration.npy"])                                     │
│                                                                                                                 │
│ /home/sam037/.conda/envs/EveryVoice.sl/lib/python3.10/site-packages/numpy/lib/npyio.py:427 in load              │
│                                                                                                                 │
│    424 │   │   │   fid = file                                                                                   │
│    425 │   │   │   own_fid = False                                                                              │
│    426 │   │   else:                                                                                            │
│ ❱  427 │   │   │   fid = stack.enter_context(open(os_fspath(file), "rb"))                                       │
│    428 │   │   │   own_fid = True                                                                               │
│    429 │   │                                                                                                    │
│    430 │   │   # Code to distinguish from NumPy binary files and pickles.                                       │
FileNotFoundError: [Errno 2] No such file or directory:
Loading EveryVoice modules: 100%|████████████████████████████████████████████████████| 6/6 [00:32<00:00,  5.38s/it]


More info

Help Message

python -m everyvoice.model.aligner.DeepForcedAligner.dfaligner.cli preprocess --help

 Usage: python -m everyvoice.model.aligner.DeepForcedAligner.dfaligner.cli preprocess

 ┃                                                Preprocess Help                                                ┃
 This command will preprocess all of the data you need for use with DeepForcedAligner. For example:

 dfaligner preprocess config/everyvoice-aligner.yaml

╭─ Arguments ─────────────────────────────────────────────────────────────────────────────────────────────────────╮
│ *    config_file      FILE  The path to your model configuration file.                                          │
│                             [default: None]                                                                     │
│                             [required]                                                                          │
╭─ Options ───────────────────────────────────────────────────────────────────────────────────────────────────────╮
│ --steps        -s      [audio|spec|text]  Which steps of the preprocessor to use. If none are provided, all     │
│                                           steps will be performed.                                              │
│                                           [default: audio, spec, text]                                          │
│ --config-args  -c      TEXT               Override the configuration.                                           │
│                                           [default: None]                                                       │
│ --cpus         -C      INTEGER            How many CPUs to use when preprocessing                               │
│                                           [default: 4]                                                          │
│ --overwrite    -O                         Redo all preprocessing, even if files already exist and aren't        │
│                                           expected to change.                                                   │
│ --debug        -D                         Enable debugging.                                                     │
│ --help         -h                         Show this message and exit.                                           │
joanise commented 3 weeks ago

I also saw the same problem when trying to test #565: I was unable to create a DFA model to see the changes from #565 in action.

joanise commented 3 weeks ago

And while the help messages are already clearer with https://github.com/EveryVoiceTTS/DeepForcedAligner/pull/26 merged in, this issue highlights that some more improvement might still be required: it should be clear how to use dfaligner from its help messages.