EveryVoiceTTS / EveryVoice

The EveryVoice TTS Toolkit - Text To Speech for your language
https://docs.everyvoice.ca
Other
20 stars 2 forks source link

Missing duration files for `extract-alignments` #576

Open SamuelLarkin opened 4 hours ago

SamuelLarkin commented 4 hours ago

Bug description

According the aligner's preprocess, running python -m everyvoice.model.aligner.DeepForcedAligner.dfaligner.cli preprocess config/everyvoice-aligner.yaml should generate all require files to do python -m everyvoice.model.aligner.DeepForcedAligner.dfaligner.cli extract-alignments config/everyvoice-aligner.yaml --no-predict. The duration files are not generated.

If they aren't generated because sox is not installed, a clear warning should be displayed.

How to reproduce the bug

Preprocess

python -m everyvoice.model.aligner.DeepForcedAligner.dfaligner.cli \
  preprocess \
  config/everyvoice-aligner.yaml

Extract Alignments

python -m everyvoice.model.aligner.DeepForcedAligner.dfaligner.cli \
  extract-alignments \
  config/everyvoice-aligner.yaml \
  --no-predict

Error messages and logs

python -m everyvoice.model.aligner.DeepForcedAligner.dfaligner.cli \
  extract-alignments \
  config/everyvoice-aligner.yaml \
  --no-predict
2024-11-01 12:16:57.455 | INFO     | __main__:extract_alignments:144 - Loading modules for alignment...
  0%|                                                                                     | 0/5000 [00:00<?, ?it/s]
╭─────────────────────────────────────── Traceback (most recent call last) ───────────────────────────────────────╮
│ /fs/hestia_Hnrc/ict/sam037/git/EveryVoice/everyvoice/model/aligner/DeepForcedAligner/dfaligner/cli.py:211 in    │
│ extract_alignments                                                                                              │
│                                                                                                                 │
│   208 │   │   speaker = item["speaker"]                                                                         │
│   209 │   │   language = item["language"]                                                                       │
│   210 │   │   tokens = item["tokens"].cpu()                                                                     │
│ ❱ 211 │   │   pred = np.load(                                                                                   │
│   212 │   │   │   save_dir                                                                                      │
│   213 │   │   │   / "duration"                                                                                  │
│   214 │   │   │   / SEP.join([basename, speaker, language, "duration.npy"])                                     │
│                                                                                                                 │
│ /home/sam037/.conda/envs/EveryVoice.sl/lib/python3.10/site-packages/numpy/lib/npyio.py:427 in load              │
│                                                                                                                 │
│    424 │   │   │   fid = file                                                                                   │
│    425 │   │   │   own_fid = False                                                                              │
│    426 │   │   else:                                                                                            │
│ ❱  427 │   │   │   fid = stack.enter_context(open(os_fspath(file), "rb"))                                       │
│    428 │   │   │   own_fid = True                                                                               │
│    429 │   │                                                                                                    │
│    430 │   │   # Code to distinguish from NumPy binary files and pickles.                                       │
╰─────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
FileNotFoundError: [Errno 2] No such file or directory:
'/gpfs/fs3c/nrc/dt/sam037/exp/EveryVoice/tiny.lj/263_wrong_checkpoint/preprocessed/duration/LJ008-0036--speaker_0--
eng--duration.npy'
Loading EveryVoice modules: 100%|████████████████████████████████████████████████████| 6/6 [00:32<00:00,  5.38s/it]

Environment

Current environment ``` #- EveryVoice Version: #- PyTorch Lightning Version (e.g., 2.4.0): #- PyTorch Version (e.g., 2.4): #- Python version (e.g., 3.12): #- OS (e.g., Linux): #- CUDA/cuDNN version: #- GPU models and configuration: #- How you installed EveryVoice (`conda`, `pip`, source): ```

More info

Help Message

python -m everyvoice.model.aligner.DeepForcedAligner.dfaligner.cli preprocess --help

 Usage: python -m everyvoice.model.aligner.DeepForcedAligner.dfaligner.cli preprocess
            [OPTIONS] CONFIG_FILE

 ┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓
 ┃                                                Preprocess Help                                                ┃
 ┗━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┛
 This command will preprocess all of the data you need for use with DeepForcedAligner. For example:

 dfaligner preprocess config/everyvoice-aligner.yaml

╭─ Arguments ─────────────────────────────────────────────────────────────────────────────────────────────────────╮
│ *    config_file      FILE  The path to your model configuration file.                                          │
│                             [default: None]                                                                     │
│                             [required]                                                                          │
╰─────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
╭─ Options ───────────────────────────────────────────────────────────────────────────────────────────────────────╮
│ --steps        -s      [audio|spec|text]  Which steps of the preprocessor to use. If none are provided, all     │
│                                           steps will be performed.                                              │
│                                           [default: audio, spec, text]                                          │
│ --config-args  -c      TEXT               Override the configuration.                                           │
│                                           [default: None]                                                       │
│ --cpus         -C      INTEGER            How many CPUs to use when preprocessing                               │
│                                           [default: 4]                                                          │
│ --overwrite    -O                         Redo all preprocessing, even if files already exist and aren't        │
│                                           expected to change.                                                   │
│ --debug        -D                         Enable debugging.                                                     │
│ --help         -h                         Show this message and exit.                                           │
╰─────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
joanise commented 2 hours ago

I also saw the same problem when trying to test #565: I was unable to create a DFA model to see the changes from #565 in action.

joanise commented 2 hours ago

And while the help messages are already clearer with https://github.com/EveryVoiceTTS/DeepForcedAligner/pull/26 merged in, this issue highlights that some more improvement might still be required: it should be clear how to use dfaligner from its help messages.