TensorSpeech / TensorFlowTTS

:stuck_out_tongue_closed_eyes: TensorFlowTTS: Real-Time State-of-the-art Speech Synthesis for Tensorflow 2 (supported including English, French, Korean, Chinese, German and Easy to adapt for other languages)
https://tensorspeech.github.io/TensorFlowTTS/
Apache License 2.0
3.82k stars 812 forks source link

Unable to open table file ...\model-120000.h5: Data loss: not an sstable (bad magic number): perhaps your file is in a different file format and you need to use a different restore operator? #370

Closed angelo0027 closed 3 years ago

angelo0027 commented 3 years ago

I'm trying to continue the training of a pretrained model using ljspeech, everything loads correctly until the point where I get this error:

Unable to open table file ...\model-120000.h5: Data loss: not an sstable (bad magic number): perhaps your file is in a different file format and you need to use a different restore operator?

This is the command I launch:

python examples/tacotron2/train_tacotron2.py --train-dir dump__DATASET__/ljspeech/train --dev-dir dump__DATASET__/ljspeech/valid --outdir examples/tacotron2/exp --config examples/tacotron2/conf/tacotron2.v1.yaml --use-norm 1 --mixed_precision 0 --resume examples/tacotron2/pretrained/model-120000.h5

The tacotron model is downloaded from this link: https://drive.google.com/drive/folders/1FQG8XC5c5JJ0jpCUl7Oqu5u2fC8IarP3

Here is the setup I have + the full console output:

absl-py 0.11.0 appdirs 1.4.4 astunparse 1.6.3 atomicwrites 1.4.0 attrs 20.3.0 audioread 2.1.9 cached-property 1.5.2 cachetools 4.1.1 certifi 2020.6.20 cffi 1.14.3 chardet 3.0.4 click 7.1.2 colorama 0.4.4 cycler 0.10.0 Cython 0.29.21 decorator 4.4.2 Distance 0.1.3 g2p-en 2.1.0 g2pM 0.1.2.5 gast 0.3.3 google-auth 1.23.0 google-auth-oauthlib 0.4.2 google-pasta 0.2.0 grpcio 1.33.2 h5py 2.10.0 idna 2.10 importlib-metadata 2.0.0 inflect 4.1.0 iniconfig 1.1.1 jamo 0.4.1 joblib 0.17.0 Keras-Preprocessing 1.1.2 kiwisolver 1.3.1 librosa 0.8.0 llvmlite 0.31.0 Markdown 3.3.3 matplotlib 3.3.3 nltk 3.5 numba 0.48.0 numpy 1.19.4 oauthlib 3.1.0 opt-einsum 3.3.0 packaging 20.4 Pillow 8.0.1 pip 20.2.4 pluggy 0.13.1 pooch 1.2.0 protobuf 3.13.0 py 1.9.0 pyasn1 0.4.8 pyasn1-modules 0.2.8 pycparser 2.20 pyparsing 2.4.7 pypinyin 0.39.1 pytest 6.1.2 python-dateutil 2.8.1 pyworld 0.2.12 PyYAML 5.3.1 regex 2020.11.11 requests 2.25.0 requests-oauthlib 1.3.0 resampy 0.2.2 rsa 4.6 scikit-learn 0.23.2 scipy 1.5.4 setuptools 50.3.1.post20201107 six 1.15.0 SoundFile 0.10.3.post1 tensorboard 2.4.0 tensorboard-plugin-wit 1.7.0 tensorflow-addons 0.11.2 tensorflow-gpu 2.3.1 tensorflow-gpu-estimator 2.3.0 TensorFlowTTS 0.9 termcolor 1.1.0 TextGrid 1.5 threadpoolctl 2.1.0 toml 0.10.2 tqdm 4.51.0 typeguard 2.10.0 Unidecode 1.1.1 urllib3 1.26.1 Werkzeug 1.0.1 wheel 0.35.1 wincertstore 0.2 wrapt 1.12.1 zipp 3.4.0

(tts_tf2) C:\Users\User\Documents\Projects\Project1\TensorFlowTTS>python examples/tacotron2/train_tacotron2.py --train-dir dump__DATASET__/ljspeech/train --dev-dir dump__DATASET__/ljspeech/valid --outdir examples/tacotron2/exp --con
fig examples/tacotron2/conf/tacotron2.v1.yaml --use-norm 1 --mixed_precision 0 --resume examples/tacotron2/pretrained/model-120000.h5
2020-11-14 23:10:22.839056: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library cudart64_101.dll
2020-11-14 23:10:24.311586: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library nvcuda.dll
2020-11-14 23:10:24.349111: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1716] Found device 0 with properties:
pciBusID: 0000:01:00.0 name: GeForce RTX 2060 computeCapability: 7.5
coreClock: 1.2GHz coreCount: 30 deviceMemorySize: 6.00GiB deviceMemoryBandwidth: 312.97GiB/s
2020-11-14 23:10:24.356042: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library cudart64_101.dll
2020-11-14 23:10:24.363408: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library cublas64_10.dll
2020-11-14 23:10:24.371156: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library cufft64_10.dll
2020-11-14 23:10:24.375888: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library curand64_10.dll
2020-11-14 23:10:24.383988: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library cusolver64_10.dll
2020-11-14 23:10:24.390570: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library cusparse64_10.dll
2020-11-14 23:10:24.402750: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library cudnn64_7.dll
2020-11-14 23:10:24.408072: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1858] Adding visible gpu devices: 0
2020-11-14 23:10:26.189142: I tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN)to use the following CPU instructions in performance-critical operatio
ns:  AVX2
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2020-11-14 23:10:26.205440: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x232fa1fad90 initialized for platform Host (this does not guarantee that XLA will be used). Devices:
2020-11-14 23:10:26.210916: I tensorflow/compiler/xla/service/service.cc:176]   StreamExecutor device (0): Host, Default Version
2020-11-14 23:10:26.214527: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1716] Found device 0 with properties:
pciBusID: 0000:01:00.0 name: GeForce RTX 2060 computeCapability: 7.5
coreClock: 1.2GHz coreCount: 30 deviceMemorySize: 6.00GiB deviceMemoryBandwidth: 312.97GiB/s
2020-11-14 23:10:26.223865: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library cudart64_101.dll
2020-11-14 23:10:26.228557: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library cublas64_10.dll
2020-11-14 23:10:26.231396: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library cufft64_10.dll
2020-11-14 23:10:26.234116: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library curand64_10.dll
2020-11-14 23:10:26.236882: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library cusolver64_10.dll
2020-11-14 23:10:26.239733: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library cusparse64_10.dll
2020-11-14 23:10:26.242701: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library cudnn64_7.dll
2020-11-14 23:10:26.247974: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1858] Adding visible gpu devices: 0
2020-11-14 23:10:26.828224: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1257] Device interconnect StreamExecutor with strength 1 edge matrix:
2020-11-14 23:10:26.832033: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1263]      0
2020-11-14 23:10:26.835860: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1276] 0:   N
2020-11-14 23:10:26.838890: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1402] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 4594 MB memory) -> physical GPU (device: 0, name: GeForce RTX 2060, pci b
us id: 0000:01:00.0, compute capability: 7.5)
2020-11-14 23:10:26.849016: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x2329c416b60 initialized for platform CUDA (this does not guarantee that XLA will be used). Devices:
2020-11-14 23:10:26.853487: I tensorflow/compiler/xla/service/service.cc:176]   StreamExecutor device (0): GeForce RTX 2060, Compute Capability 7.5
2020-11-14 23:12:55,018 (train_tacotron2:406) INFO: hop_size = 256
2020-11-14 23:12:55,018 (train_tacotron2:406) INFO: format = npy
2020-11-14 23:12:55,018 (train_tacotron2:406) INFO: model_type = tacotron2
2020-11-14 23:12:55,019 (train_tacotron2:406) INFO: tacotron2_params = {'dataset': 'ljspeech', 'embedding_hidden_size': 512, 'initializer_range': 0.02, 'embedding_dropout_prob': 0.1, 'n_speakers': 1, 'n_conv_encoder': 5, 'encoder_conv_
filters': 512, 'encoder_conv_kernel_sizes': 5, 'encoder_conv_activation': 'relu', 'encoder_conv_dropout_rate': 0.5, 'encoder_lstm_units': 256, 'n_prenet_layers': 2, 'prenet_units': 256, 'prenet_activation': 'relu', 'prenet_dropout_rate
': 0.5, 'n_lstm_decoder': 1, 'reduction_factor': 1, 'decoder_lstm_units': 1024, 'attention_dim': 128, 'attention_filters': 32, 'attention_kernel': 31, 'n_mels': 80, 'n_conv_postnet': 5, 'postnet_conv_filters': 512, 'postnet_conv_kernel
_sizes': 5, 'postnet_dropout_rate': 0.1, 'attention_type': 'lsa'}
2020-11-14 23:12:55,019 (train_tacotron2:406) INFO: batch_size = 32
2020-11-14 23:12:55,019 (train_tacotron2:406) INFO: remove_short_samples = True
2020-11-14 23:12:55,019 (train_tacotron2:406) INFO: allow_cache = True
2020-11-14 23:12:55,020 (train_tacotron2:406) INFO: mel_length_threshold = 32
2020-11-14 23:12:55,020 (train_tacotron2:406) INFO: is_shuffle = True
2020-11-14 23:12:55,020 (train_tacotron2:406) INFO: use_fixed_shapes = False
2020-11-14 23:12:55,020 (train_tacotron2:406) INFO: optimizer_params = {'initial_learning_rate': 0.001, 'end_learning_rate': 1e-05, 'decay_steps': 150000, 'warmup_proportion': 0.02, 'weight_decay': 0.001}
2020-11-14 23:12:55,021 (train_tacotron2:406) INFO: var_train_expr = None
2020-11-14 23:12:55,021 (train_tacotron2:406) INFO: train_max_steps = 200000
2020-11-14 23:12:55,021 (train_tacotron2:406) INFO: save_interval_steps = 2000
2020-11-14 23:12:55,021 (train_tacotron2:406) INFO: eval_interval_steps = 500
2020-11-14 23:12:55,022 (train_tacotron2:406) INFO: log_interval_steps = 200
2020-11-14 23:12:55,022 (train_tacotron2:406) INFO: start_schedule_teacher_forcing = 200001
2020-11-14 23:12:55,022 (train_tacotron2:406) INFO: start_ratio_value = 0.5
2020-11-14 23:12:55,022 (train_tacotron2:406) INFO: schedule_decay_steps = 50000
2020-11-14 23:12:55,022 (train_tacotron2:406) INFO: end_ratio_value = 0.0
2020-11-14 23:12:55,023 (train_tacotron2:406) INFO: num_save_intermediate_results = 1
2020-11-14 23:12:55,023 (train_tacotron2:406) INFO: train_dir = dump__DATASET__/ljspeech/train
2020-11-14 23:12:55,023 (train_tacotron2:406) INFO: dev_dir = dump__DATASET__/ljspeech/valid
2020-11-14 23:12:55,023 (train_tacotron2:406) INFO: use_norm = True
2020-11-14 23:12:55,023 (train_tacotron2:406) INFO: outdir = examples/tacotron2/exp
2020-11-14 23:12:55,023 (train_tacotron2:406) INFO: config = examples/tacotron2/conf/tacotron2.v1.yaml
2020-11-14 23:12:55,024 (train_tacotron2:406) INFO: resume = examples/tacotron2/pretrained/model-120000.h5
2020-11-14 23:12:55,024 (train_tacotron2:406) INFO: verbose = 1
2020-11-14 23:12:55,024 (train_tacotron2:406) INFO: mixed_precision = False
2020-11-14 23:12:55,024 (train_tacotron2:406) INFO: checkpoint =
2020-11-14 23:12:55,024 (train_tacotron2:406) INFO: pretrained =
2020-11-14 23:12:55,024 (train_tacotron2:406) INFO: version = 0.9
2020-11-14 23:12:55,025 (train_tacotron2:406) INFO: max_mel_length = 870
2020-11-14 23:12:55,025 (train_tacotron2:406) INFO: max_char_length = 188
2020-11-14 23:13:04.691081: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library cudnn64_7.dll
2020-11-14 23:13:05.914415: W tensorflow/stream_executor/gpu/redzone_allocator.cc:314] Internal: Invoking GPU asm compilation is supported on Cuda non-Windows platforms only
Relying on driver to perform ptx compilation.
Modify $PATH to customize ptxas location.
This message will be only logged once.
2020-11-14 23:13:06.025560: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library cublas64_10.dll
Model: "tacotron2"
_________________________________________________________________
Layer (type)                 Output Shape              Param #
=================================================================
encoder (TFTacotronEncoder)  multiple                  8218624
_________________________________________________________________
decoder_cell (TFTacotronDeco multiple                  18246402
_________________________________________________________________
post_net (TFTacotronPostnet) multiple                  5460480
_________________________________________________________________
residual_projection (Dense)  multiple                  41040
=================================================================
Total params: 31,966,546
Trainable params: 31,956,306
Non-trainable params: 10,240
_________________________________________________________________
2020-11-14 23:13:08.931081: W tensorflow/core/util/tensor_slice_reader.cc:95] Could not open examples\tacotron2\pretrained\model-120000.h5: Data loss: not an sstable (bad magic number): perhaps your file is in a different file format a
nd you need to use a different restore operator?
Traceback (most recent call last):
  File "C:\Users\User\Documents\Anaconda3\envs\tts_tf2\lib\site-packages\tensorflow\python\training\py_checkpoint_reader.py", line 95, in NewCheckpointReader
    return CheckpointReader(compat.as_bytes(filepattern))
RuntimeError: Unable to open table file examples\tacotron2\pretrained\model-120000.h5: Data loss: not an sstable (bad magic number): perhaps your file is in a different file format and you need to use a different restore operator?

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "examples/tacotron2/train_tacotron2.py", line 494, in <module>
    main()
  File "examples/tacotron2/train_tacotron2.py", line 486, in main
    resume=args.resume,
  File "C:\Users\User\Documents\Anaconda3\envs\tts_tf2\lib\site-packages\tensorflow_tts\trainers\base_trainer.py", line 850, in fit
    self.load_checkpoint(resume)
  File "C:\Users\User\Documents\Anaconda3\envs\tts_tf2\lib\site-packages\tensorflow_tts\trainers\base_trainer.py", line 811, in load_checkpoint
    self.ckpt.restore(pretrained_path)
  File "C:\Users\User\Documents\Anaconda3\envs\tts_tf2\lib\site-packages\tensorflow\python\training\tracking\util.py", line 2118, in restore
    status = self.read(save_path, options=options)
  File "C:\Users\User\Documents\Anaconda3\envs\tts_tf2\lib\site-packages\tensorflow\python\training\tracking\util.py", line 2035, in read
    return self._saver.restore(save_path=save_path, options=options)
  File "C:\Users\User\Documents\Anaconda3\envs\tts_tf2\lib\site-packages\tensorflow\python\training\tracking\util.py", line 1275, in restore
    reader = py_checkpoint_reader.NewCheckpointReader(save_path)
  File "C:\Users\User\Documents\Anaconda3\envs\tts_tf2\lib\site-packages\tensorflow\python\training\py_checkpoint_reader.py", line 99, in NewCheckpointReader
    error_translator(e)
  File "C:\Users\User\Documents\Anaconda3\envs\tts_tf2\lib\site-packages\tensorflow\python\training\py_checkpoint_reader.py", line 44, in error_translator
    raise errors_impl.DataLossError(None, None, error_message)
tensorflow.python.framework.errors_impl.DataLossError: Unable to open table file examples\tacotron2\pretrained\model-120000.h5: Data loss: not an sstable (bad magic number): perhaps your file is in a different file format and you need
to use a different restore operator?
dathudeptrai commented 3 years ago

@angelo0027 not sure what is the bug =))

angelo0027 commented 3 years ago

@angelo0027 not sure what is the bug =))

Shouldn't it be able to open h5? 🤔 And by the way, before I installed CUDA it was able to run on CPU, after I installed it, I have this error

angelo0027 commented 3 years ago

Anyone knows why this is happening? That's the file provided by the README, should be working I guess?

dathudeptrai commented 3 years ago

@angelo0027 ur script is wrong, it should be

--resume examples/tacotron2/pretrained/ckpt-120000
angelo0027 commented 3 years ago

@angelo0027 ur script is wrong, it should be

--resume examples/tacotron2/pretrained/ckpt-120000

That won't do...

tensorflow.python.framework.errors_impl.NotFoundError: Unsuccessful TensorSliceReader constructor: Failed to find any matching files for examples/tacotron2/pretrained/ckpt-120000

Is there another file I should be downloading? In the checkpoint folder at the link, all are called model-XXXXX.h5

dathudeptrai commented 3 years ago

@angelo0027 you should training from scratch with the pretrained h5 file, so it's not resume, it's pretrained.

--pretrained model-120000.h5
angelo0027 commented 3 years ago

@angelo0027 you should training from scratch with the pretrained h5 file, so it's not resume, it's pretrained.

--pretrained model-120000.h5

Ah! I see now. Yes that worked, thanks!