EveryVoiceTTS / EveryVoice

The EveryVoice TTS Toolkit - Text To Speech for your language
https://docs.everyvoice.ca
Other
23 stars 2 forks source link

allow synthesis with only the generator. add everyvoice export #552

Closed roedoejet closed 2 months ago

roedoejet commented 2 months ago

allow synthesis with only the generator checkpoint

fixes https://github.com/EveryVoiceTTS/EveryVoice/issues/424

PR Goal?

This PR allows users to "export" vocoder models (everyvoice export) which trims down the model by removing the discriminators. It also allows loading the vocoder model and inference/synthesis (not training) using only the generator. This allows us to distribute 50MB files instead of 950MB files when all people want to do is synthesize (not fine-tune/resume training)

Fixes?

https://github.com/EveryVoiceTTS/EveryVoice/issues/424

Feedback sought?

sanitty

Priority?

high - alpha 4

Tests added?

Again, we do not have tests for inference unfortunately

How to test?

take an existing vocoder and run everyvoice export <path_to_vocoder.ckpt> then try synthesizing using the generated exported.ckpt checkpoint.

Confidence?

medium-high

Version change?

yes, alpha 4

Related PRs?

https://github.com/EveryVoiceTTS/HiFiGAN_iSTFT_lightning/pull/34

semanticdiff-com[bot] commented 2 months ago

Review changes with SemanticDiff.

Analyzed 2 of 3 files.

Filename Status
:heavy_check_mark: everyvoice/cli.py Analyzed
:heavy_check_mark: everyvoice/model/vocoder/HiFiGAN_iSTFT_lightning Analyzed
:grey_question: .github/workflows/test.yml Unsupported file format
github-actions[bot] commented 2 months ago
CLI load time: 0:00.24
Pull Request HEAD: 96bc4d6a7e6eeb19dccd7138054d2a2848188516
Imports that take more than 0.1 s:
import time: self [us] | cumulative | imported package
codecov[bot] commented 2 months ago

Codecov Report

All modified and coverable lines are covered by tests :white_check_mark:

Project coverage is 77.90%. Comparing base (eb460a2) to head (96bc4d6). Report is 9 commits behind head on main.

Additional details and impacted files ```diff @@ Coverage Diff @@ ## main #552 +/- ## ========================================== + Coverage 74.63% 77.90% +3.27% ========================================== Files 46 46 Lines 3130 3730 +600 Branches 510 712 +202 ========================================== + Hits 2336 2906 +570 - Misses 693 716 +23 - Partials 101 108 +7 ```

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

joanise commented 2 months ago

Hum, something is broken with our pre-commit CI workaround: ecd5478d6d ran isort and broke what was previously correctly isorted.

joanise commented 2 months ago

CI failed due to #555. Fixed by excluding submodules from pre-commit run in CI.

This PR now also fixes #555

roedoejet commented 2 months ago

CI failed due to #555. Fixed by excluding submodules from pre-commit run in CI.

This PR now also fixes #555

Thanks @joanise ! yes, I saw this last week but couldn't see what was happening. thanks for digging into this!

joanise commented 2 months ago

The CI commit is now separate in #557, we can remove it from this branch once that PR is merged, just rebasing on the (future) updated main should do.

marctessier commented 2 months ago

Running some tests.

Wondering in the help message if we should add that the path to the vocoder model is needed as an argument?

everyvoice export --help

 Usage: everyvoice export [OPTIONS] COMMAND [ARGS]...                                                                                           

 ┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓ 
 ┃                                                                Export Help                                                                 ┃ 
 ┗━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┛ 

  • spec-to-wav --- You can export your spec-to-wav model to a much smaller format for synthesis. Advanced: this will export only the           
    generator, leaving the weights of the discriminators behind.                                                                                

╭─ Options ────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
│ --help  -h        Show this message and exit.                                                                                                │
╰──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
╭─ Commands ───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
│ spec-to-wav   Export and optimize a spec-to-wav model for inference                                                                          │
╰──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯

(EveryVoice) [U20-GPSC7]:$ everyvoice export spec-to-wav 
╭───────────────────────────────────────────────────── Traceback (most recent call last) ──────────────────────────────────────────────────────╮
│ /gpfs/fs5/nrc/nrc-fs1/ict/others/u/tes001/TxT2SPEECH/EveryVoice/everyvoice/model/vocoder/HiFiGAN_iSTFT_lightning/hfgl/cli.py:94 in export    │
│                                                                                                                                              │
│    91 │                                                                                                                                      │
│    92 │   from .utils import sizeof_fmt                                                                                                      │
│    93 │                                                                                                                                      │
│ ❱  94 │   orig_size = sizeof_fmt(os.path.getsize(model_path))                                                                                │
│    95 │   vocoder_ckpt = torch.load(model_path, map_location=torch.device("cpu"))                                                            │
│    96 │   for k in list(vocoder_ckpt["state_dict"].keys()):                                                                                  │
│    97 │   │   if not k.startswith("generator"):                                                                                              │
│                                                                                                                                              │
│ /home/tes001/u/TxT2SPEECH/miniconda3_u20/envs/EveryVoice/lib/python3.10/genericpath.py:50 in getsize                                         │
│                                                                                                                                              │
│    47                                                                                                                                        │
│    48 def getsize(filename):                                                                                                                 │
│    49 │   """Return the size of a file, reported by os.stat()."""                                                                            │
│ ❱  50 │   return os.stat(filename).st_size                                                                                                   │
│    51                                                                                                                                        │
│    52                                                                                                                                        │
│    53 def getmtime(filename):                                                                                                                │
╰──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
TypeError: stat: path should be string, bytes, os.PathLike or integer, not NoneType

I am running a bit more test but looks good up to now :-) I think I was expecting the exported vocoder be placed in the "./logs_and_checkpoints/VocoderExperiment/base/checkpoints/" but I see it was exported to the location where I ran the script. That should also be good and easier I think.

everyvoice export spec-to-wav  ./logs_and_checkpoints/VocoderExperiment/base/checkpoints/last.ckpt 
2024-09-23 11:40:07.137 | INFO     | everyvoice.model.vocoder.HiFiGAN_iSTFT_lightning.hfgl.cli:export:105 - Checkpoint saved at 'exported.ckpt'. Reduced size from 969.4MiB to 53.3MiB. This checkpoint will only be usable for inference/synthesis, and not for training.
roedoejet commented 2 months ago

Running some tests.

Wondering in the help message if we should add that the path to the vocoder model is needed as an argument?

everyvoice export --help

 Usage: everyvoice export [OPTIONS] COMMAND [ARGS]...                                                                                           

 ┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓ 
 ┃                                                                Export Help                                                                 ┃ 
 ┗━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┛ 

  • spec-to-wav --- You can export your spec-to-wav model to a much smaller format for synthesis. Advanced: this will export only the           
    generator, leaving the weights of the discriminators behind.                                                                                

╭─ Options ────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
│ --help  -h        Show this message and exit.                                                                                                │
╰──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
╭─ Commands ───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
│ spec-to-wav   Export and optimize a spec-to-wav model for inference                                                                          │
╰──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯

(EveryVoice) [U20-GPSC7]:$ everyvoice export spec-to-wav 
╭───────────────────────────────────────────────────── Traceback (most recent call last) ──────────────────────────────────────────────────────╮
│ /gpfs/fs5/nrc/nrc-fs1/ict/others/u/tes001/TxT2SPEECH/EveryVoice/everyvoice/model/vocoder/HiFiGAN_iSTFT_lightning/hfgl/cli.py:94 in export    │
│                                                                                                                                              │
│    91 │                                                                                                                                      │
│    92 │   from .utils import sizeof_fmt                                                                                                      │
│    93 │                                                                                                                                      │
│ ❱  94 │   orig_size = sizeof_fmt(os.path.getsize(model_path))                                                                                │
│    95 │   vocoder_ckpt = torch.load(model_path, map_location=torch.device("cpu"))                                                            │
│    96 │   for k in list(vocoder_ckpt["state_dict"].keys()):                                                                                  │
│    97 │   │   if not k.startswith("generator"):                                                                                              │
│                                                                                                                                              │
│ /home/tes001/u/TxT2SPEECH/miniconda3_u20/envs/EveryVoice/lib/python3.10/genericpath.py:50 in getsize                                         │
│                                                                                                                                              │
│    47                                                                                                                                        │
│    48 def getsize(filename):                                                                                                                 │
│    49 │   """Return the size of a file, reported by os.stat()."""                                                                            │
│ ❱  50 │   return os.stat(filename).st_size                                                                                                   │
│    51                                                                                                                                        │
│    52                                                                                                                                        │
│    53 def getmtime(filename):                                                                                                                │
╰──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
TypeError: stat: path should be string, bytes, os.PathLike or integer, not NoneType

I am running a bit more test but looks good up to now :-) I think I was expecting the exported vocoder be placed in the "./logs_and_checkpoints/VocoderExperiment/base/checkpoints/" but I see it was exported to the location where I ran the script. That should also be good and easier I think.

everyvoice export spec-to-wav  ./logs_and_checkpoints/VocoderExperiment/base/checkpoints/last.ckpt 
2024-09-23 11:40:07.137 | INFO     | everyvoice.model.vocoder.HiFiGAN_iSTFT_lightning.hfgl.cli:export:105 - Checkpoint saved at 'exported.ckpt'. Reduced size from 969.4MiB to 53.3MiB. This checkpoint will only be usable for inference/synthesis, and not for training.

Nice catch, yes the model path is required and should give a better message. I just fixed this.