Closed roedoejet closed 2 months ago
Review changes with SemanticDiff.
Analyzed 2 of 3 files.
Filename | Status | |
---|---|---|
:heavy_check_mark: | everyvoice/cli.py | Analyzed |
:heavy_check_mark: | everyvoice/model/vocoder/HiFiGAN_iSTFT_lightning | Analyzed |
:grey_question: | .github/workflows/test.yml | Unsupported file format |
CLI load time: 0:00.24
Pull Request HEAD: 96bc4d6a7e6eeb19dccd7138054d2a2848188516
Imports that take more than 0.1 s:
import time: self [us] | cumulative | imported package
All modified and coverable lines are covered by tests :white_check_mark:
Project coverage is 77.90%. Comparing base (
eb460a2
) to head (96bc4d6
). Report is 9 commits behind head on main.
:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.
Hum, something is broken with our pre-commit CI workaround: ecd5478d6d ran isort
and broke what was previously correctly isorted.
CI failed due to #555. Fixed by excluding submodules from pre-commit run in CI.
This PR now also fixes #555
CI failed due to #555. Fixed by excluding submodules from pre-commit run in CI.
This PR now also fixes #555
Thanks @joanise ! yes, I saw this last week but couldn't see what was happening. thanks for digging into this!
The CI commit is now separate in #557, we can remove it from this branch once that PR is merged, just rebasing on the (future) updated main should do.
Running some tests.
Wondering in the help message if we should add that the path to the vocoder model is needed as an argument?
everyvoice export --help
Usage: everyvoice export [OPTIONS] COMMAND [ARGS]...
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓
┃ Export Help ┃
┗━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┛
• spec-to-wav --- You can export your spec-to-wav model to a much smaller format for synthesis. Advanced: this will export only the
generator, leaving the weights of the discriminators behind.
╭─ Options ────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
│ --help -h Show this message and exit. │
╰──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
╭─ Commands ───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
│ spec-to-wav Export and optimize a spec-to-wav model for inference │
╰──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
(EveryVoice) [U20-GPSC7]:$ everyvoice export spec-to-wav
╭───────────────────────────────────────────────────── Traceback (most recent call last) ──────────────────────────────────────────────────────╮
│ /gpfs/fs5/nrc/nrc-fs1/ict/others/u/tes001/TxT2SPEECH/EveryVoice/everyvoice/model/vocoder/HiFiGAN_iSTFT_lightning/hfgl/cli.py:94 in export │
│ │
│ 91 │ │
│ 92 │ from .utils import sizeof_fmt │
│ 93 │ │
│ ❱ 94 │ orig_size = sizeof_fmt(os.path.getsize(model_path)) │
│ 95 │ vocoder_ckpt = torch.load(model_path, map_location=torch.device("cpu")) │
│ 96 │ for k in list(vocoder_ckpt["state_dict"].keys()): │
│ 97 │ │ if not k.startswith("generator"): │
│ │
│ /home/tes001/u/TxT2SPEECH/miniconda3_u20/envs/EveryVoice/lib/python3.10/genericpath.py:50 in getsize │
│ │
│ 47 │
│ 48 def getsize(filename): │
│ 49 │ """Return the size of a file, reported by os.stat().""" │
│ ❱ 50 │ return os.stat(filename).st_size │
│ 51 │
│ 52 │
│ 53 def getmtime(filename): │
╰──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
TypeError: stat: path should be string, bytes, os.PathLike or integer, not NoneType
I am running a bit more test but looks good up to now :-) I think I was expecting the exported vocoder be placed in the "./logs_and_checkpoints/VocoderExperiment/base/checkpoints/" but I see it was exported to the location where I ran the script. That should also be good and easier I think.
everyvoice export spec-to-wav ./logs_and_checkpoints/VocoderExperiment/base/checkpoints/last.ckpt
2024-09-23 11:40:07.137 | INFO | everyvoice.model.vocoder.HiFiGAN_iSTFT_lightning.hfgl.cli:export:105 - Checkpoint saved at 'exported.ckpt'. Reduced size from 969.4MiB to 53.3MiB. This checkpoint will only be usable for inference/synthesis, and not for training.
Running some tests.
Wondering in the help message if we should add that the path to the vocoder model is needed as an argument?
everyvoice export --help Usage: everyvoice export [OPTIONS] COMMAND [ARGS]... ┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓ ┃ Export Help ┃ ┗━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┛ • spec-to-wav --- You can export your spec-to-wav model to a much smaller format for synthesis. Advanced: this will export only the generator, leaving the weights of the discriminators behind. ╭─ Options ────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮ │ --help -h Show this message and exit. │ ╰──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯ ╭─ Commands ───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮ │ spec-to-wav Export and optimize a spec-to-wav model for inference │ ╰──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯ (EveryVoice) [U20-GPSC7]:$ everyvoice export spec-to-wav ╭───────────────────────────────────────────────────── Traceback (most recent call last) ──────────────────────────────────────────────────────╮ │ /gpfs/fs5/nrc/nrc-fs1/ict/others/u/tes001/TxT2SPEECH/EveryVoice/everyvoice/model/vocoder/HiFiGAN_iSTFT_lightning/hfgl/cli.py:94 in export │ │ │ │ 91 │ │ │ 92 │ from .utils import sizeof_fmt │ │ 93 │ │ │ ❱ 94 │ orig_size = sizeof_fmt(os.path.getsize(model_path)) │ │ 95 │ vocoder_ckpt = torch.load(model_path, map_location=torch.device("cpu")) │ │ 96 │ for k in list(vocoder_ckpt["state_dict"].keys()): │ │ 97 │ │ if not k.startswith("generator"): │ │ │ │ /home/tes001/u/TxT2SPEECH/miniconda3_u20/envs/EveryVoice/lib/python3.10/genericpath.py:50 in getsize │ │ │ │ 47 │ │ 48 def getsize(filename): │ │ 49 │ """Return the size of a file, reported by os.stat().""" │ │ ❱ 50 │ return os.stat(filename).st_size │ │ 51 │ │ 52 │ │ 53 def getmtime(filename): │ ╰──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯ TypeError: stat: path should be string, bytes, os.PathLike or integer, not NoneType
I am running a bit more test but looks good up to now :-) I think I was expecting the exported vocoder be placed in the "./logs_and_checkpoints/VocoderExperiment/base/checkpoints/" but I see it was exported to the location where I ran the script. That should also be good and easier I think.
everyvoice export spec-to-wav ./logs_and_checkpoints/VocoderExperiment/base/checkpoints/last.ckpt 2024-09-23 11:40:07.137 | INFO | everyvoice.model.vocoder.HiFiGAN_iSTFT_lightning.hfgl.cli:export:105 - Checkpoint saved at 'exported.ckpt'. Reduced size from 969.4MiB to 53.3MiB. This checkpoint will only be usable for inference/synthesis, and not for training.
Nice catch, yes the model path is required and should give a better message. I just fixed this.
allow synthesis with only the generator checkpoint
fixes https://github.com/EveryVoiceTTS/EveryVoice/issues/424
PR Goal?
This PR allows users to "export" vocoder models (
everyvoice export
) which trims down the model by removing the discriminators. It also allows loading the vocoder model and inference/synthesis (not training) using only the generator. This allows us to distribute 50MB files instead of 950MB files when all people want to do is synthesize (not fine-tune/resume training)Fixes?
https://github.com/EveryVoiceTTS/EveryVoice/issues/424
Feedback sought?
sanitty
Priority?
high - alpha 4
Tests added?
Again, we do not have tests for inference unfortunately
How to test?
take an existing vocoder and run
everyvoice export <path_to_vocoder.ckpt>
then try synthesizing using the generatedexported.ckpt
checkpoint.Confidence?
medium-high
Version change?
yes, alpha 4
Related PRs?
https://github.com/EveryVoiceTTS/HiFiGAN_iSTFT_lightning/pull/34