TensorSpeech / TensorFlowTTS

:stuck_out_tongue_closed_eyes: TensorFlowTTS: Real-Time State-of-the-art Speech Synthesis for Tensorflow 2 (supported including English, French, Korean, Chinese, German and Easy to adapt for other languages)
https://tensorspeech.github.io/TensorFlowTTS/
Apache License 2.0
3.82k stars 812 forks source link

error to save checkpoint #371

Closed lalimili6 closed 3 years ago

lalimili6 commented 3 years ago

Hi Dear, I got this error, how can fix it? best regards

CUDA_VISIBLE_DEVICES=0 python examples/tacotron2/train_tacotron2.py   --train-dir ./dump_ljspeech/train   --dev-dir ./dump_ljspeech/valid   --outdir ./examples/tacotron2/exp/train.tacotron2.v1   --config ./examples/tacotron2/conf/tacotron2.v1.yaml   --use-norm 1   --mixed_precision 0   --resume ""
2020-11-12 09:56:32.274583: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudart.so.10.1
2020-11-12 09:56:33.215461: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcuda.so.1
2020-11-12 09:56:33.235546: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:982] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-11-12 09:56:33.235857: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1716] Found device 0 with properties: 
pciBusID: 0000:01:00.0 name: GeForce GT 1030 computeCapability: 6.1
coreClock: 1.468GHz coreCount: 3 deviceMemorySize: 1.95GiB deviceMemoryBandwidth: 44.76GiB/s
2020-11-12 09:56:33.235893: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudart.so.10.1
2020-11-12 09:56:33.238094: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcublas.so.10
2020-11-12 09:56:33.239987: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcufft.so.10
2020-11-12 09:56:33.240307: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcurand.so.10
2020-11-12 09:56:33.242739: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcusolver.so.10
2020-11-12 09:56:33.244106: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcusparse.so.10
2020-11-12 09:56:33.248597: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudnn.so.7
2020-11-12 09:56:33.248739: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:982] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-11-12 09:56:33.249024: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:982] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-11-12 09:56:33.249221: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1858] Adding visible gpu devices: 0
2020-11-12 09:56:34.177622: I tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN)to use the following CPU instructions in performance-critical operations:  AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2020-11-12 09:56:34.183863: I tensorflow/core/platform/profile_utils/cpu_utils.cc:104] CPU Frequency: 3399905000 Hz
2020-11-12 09:56:34.184165: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x4c7b0f0 initialized for platform Host (this does not guarantee that XLA will be used). Devices:
2020-11-12 09:56:34.184192: I tensorflow/compiler/xla/service/service.cc:176]   StreamExecutor device (0): Host, Default Version
2020-11-12 09:56:34.302248: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:982] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-11-12 09:56:34.302680: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x4ccaf40 initialized for platform CUDA (this does not guarantee that XLA will be used). Devices:
2020-11-12 09:56:34.302706: I tensorflow/compiler/xla/service/service.cc:176]   StreamExecutor device (0): GeForce GT 1030, Compute Capability 6.1
2020-11-12 09:56:34.302923: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:982] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-11-12 09:56:34.303212: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1716] Found device 0 with properties: 
pciBusID: 0000:01:00.0 name: GeForce GT 1030 computeCapability: 6.1
coreClock: 1.468GHz coreCount: 3 deviceMemorySize: 1.95GiB deviceMemoryBandwidth: 44.76GiB/s
2020-11-12 09:56:34.303251: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudart.so.10.1
2020-11-12 09:56:34.303293: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcublas.so.10
2020-11-12 09:56:34.303318: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcufft.so.10
2020-11-12 09:56:34.303342: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcurand.so.10
2020-11-12 09:56:34.303366: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcusolver.so.10
2020-11-12 09:56:34.303390: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcusparse.so.10
2020-11-12 09:56:34.303414: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudnn.so.7
2020-11-12 09:56:34.303494: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:982] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-11-12 09:56:34.303837: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:982] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-11-12 09:56:34.304110: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1858] Adding visible gpu devices: 0
2020-11-12 09:56:34.304146: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudart.so.10.1
2020-11-12 09:56:34.812439: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1257] Device interconnect StreamExecutor with strength 1 edge matrix:
2020-11-12 09:56:34.812475: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1263]      0 
2020-11-12 09:56:34.812483: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1276] 0:   N 
2020-11-12 09:56:34.812680: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:982] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-11-12 09:56:34.812966: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:982] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-11-12 09:56:34.813184: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1402] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 1420 MB memory) -> physical GPU (device: 0, name: GeForce GT 1030, pci bus id: 0000:01:00.0, compute capability: 6.1)
2020-11-12 09:56:37,789 (train_tacotron2:400) INFO: hop_size = 256
2020-11-12 09:56:37,789 (train_tacotron2:400) INFO: format = npy
2020-11-12 09:56:37,789 (train_tacotron2:400) INFO: model_type = tacotron2
2020-11-12 09:56:37,789 (train_tacotron2:400) INFO: tacotron2_params = {'dataset': 'ljspeech', 'embedding_hidden_size': 512, 'initializer_range': 0.02, 'embedding_dropout_prob': 0.1, 'n_speakers': 1, 'n_conv_encoder': 5, 'encoder_conv_filters': 512, 'encoder_conv_kernel_sizes': 5, 'encoder_conv_activation': 'relu', 'encoder_conv_dropout_rate': 0.5, 'encoder_lstm_units': 256, 'n_prenet_layers': 2, 'prenet_units': 256, 'prenet_activation': 'relu', 'prenet_dropout_rate': 0.5, 'n_lstm_decoder': 1, 'reduction_factor': 1, 'decoder_lstm_units': 1024, 'attention_dim': 128, 'attention_filters': 32, 'attention_kernel': 31, 'n_mels': 80, 'n_conv_postnet': 5, 'postnet_conv_filters': 512, 'postnet_conv_kernel_sizes': 5, 'postnet_dropout_rate': 0.1, 'attention_type': 'lsa'}
2020-11-12 09:56:37,790 (train_tacotron2:400) INFO: batch_size = 2
2020-11-12 09:56:37,790 (train_tacotron2:400) INFO: remove_short_samples = True
2020-11-12 09:56:37,790 (train_tacotron2:400) INFO: allow_cache = True
2020-11-12 09:56:37,790 (train_tacotron2:400) INFO: mel_length_threshold = 32
2020-11-12 09:56:37,790 (train_tacotron2:400) INFO: is_shuffle = True
2020-11-12 09:56:37,790 (train_tacotron2:400) INFO: use_fixed_shapes = False
2020-11-12 09:56:37,790 (train_tacotron2:400) INFO: optimizer_params = {'initial_learning_rate': 0.001, 'end_learning_rate': 1e-05, 'decay_steps': 150000, 'warmup_proportion': 0.02, 'weight_decay': 0.001}
2020-11-12 09:56:37,790 (train_tacotron2:400) INFO: var_train_expr = None
2020-11-12 09:56:37,790 (train_tacotron2:400) INFO: train_max_steps = 200000
2020-11-12 09:56:37,790 (train_tacotron2:400) INFO: save_interval_steps = 1
2020-11-12 09:56:37,790 (train_tacotron2:400) INFO: eval_interval_steps = 600
2020-11-12 09:56:37,790 (train_tacotron2:400) INFO: log_interval_steps = 200
2020-11-12 09:56:37,790 (train_tacotron2:400) INFO: start_schedule_teacher_forcing = 200001
2020-11-12 09:56:37,790 (train_tacotron2:400) INFO: start_ratio_value = 0.5
2020-11-12 09:56:37,790 (train_tacotron2:400) INFO: schedule_decay_steps = 50000
2020-11-12 09:56:37,790 (train_tacotron2:400) INFO: end_ratio_value = 0.0
2020-11-12 09:56:37,790 (train_tacotron2:400) INFO: num_save_intermediate_results = 1
2020-11-12 09:56:37,790 (train_tacotron2:400) INFO: train_dir = ./dump_ljspeech/train
2020-11-12 09:56:37,790 (train_tacotron2:400) INFO: dev_dir = ./dump_ljspeech/valid
2020-11-12 09:56:37,790 (train_tacotron2:400) INFO: use_norm = True
2020-11-12 09:56:37,790 (train_tacotron2:400) INFO: outdir = ./examples/tacotron2/exp/train.tacotron2.v1
2020-11-12 09:56:37,790 (train_tacotron2:400) INFO: config = ./examples/tacotron2/conf/tacotron2.v1.yaml
2020-11-12 09:56:37,790 (train_tacotron2:400) INFO: resume = 
2020-11-12 09:56:37,790 (train_tacotron2:400) INFO: verbose = 1
2020-11-12 09:56:37,790 (train_tacotron2:400) INFO: mixed_precision = False
2020-11-12 09:56:37,790 (train_tacotron2:400) INFO: pretrained = 
2020-11-12 09:56:37,790 (train_tacotron2:400) INFO: version = 0.0
2020-11-12 09:56:37,790 (train_tacotron2:400) INFO: max_mel_length = 1289
2020-11-12 09:56:37,790 (train_tacotron2:400) INFO: max_char_length = 168
2020-11-12 09:56:38.453183: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudnn.so.7
2020-11-12 09:56:39.270607: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcublas.so.10
Model: "tacotron2"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
encoder (TFTacotronEncoder)  multiple                  8167936   
_________________________________________________________________
decoder_cell (TFTacotronDeco multiple                  18246402  
_________________________________________________________________
post_net (TFTacotronPostnet) multiple                  5460480   
_________________________________________________________________
residual_projection (Dense)  multiple                  41040     
=================================================================
Total params: 31,915,858
Trainable params: 31,905,618
Non-trainable params: 10,240
_________________________________________________________________
[train]:   0%|                                                                  | 0/200000 [00:00<?, ?it/s]2020-11-12 09:56:42.433861: W tensorflow/core/grappler/optimizers/loop_optimizer.cc:906] Skipping loop optimization for Merge node with control input: cond/branch_executed/_8
[train]:   0%|                                                     | 1/200000 [00:25<1437:19:00, 25.87s/it]Traceback (most recent call last):
  File "examples/tacotron2/train_tacotron2.py", line 488, in <module>
    main()
  File "examples/tacotron2/train_tacotron2.py", line 476, in main
    trainer.fit(
  File "/home/javad/.local/lib/python3.8/site-packages/tensorflow_tts/trainers/base_trainer.py", line 870, in fit
    self.run()
  File "/home/javad/.local/lib/python3.8/site-packages/tensorflow_tts/trainers/base_trainer.py", line 101, in run
    self._train_epoch()
  File "/home/javad/.local/lib/python3.8/site-packages/tensorflow_tts/trainers/base_trainer.py", line 128, in _train_epoch
    self._check_save_interval()
  File "/home/javad/.local/lib/python3.8/site-packages/tensorflow_tts/trainers/base_trainer.py", line 169, in _check_save_interval
    self.save_checkpoint()
  File "/home/javad/.local/lib/python3.8/site-packages/tensorflow_tts/trainers/base_trainer.py", line 825, in save_checkpoint
    self._model.save_weights(self.saved_path + "model-{}.h5".format(self.steps))
  File "/home/javad/.local/lib/python3.8/site-packages/tensorflow/python/keras/engine/training.py", line 2085, in save_weights
    hdf5_format.save_weights_to_hdf5_group(f, self.layers)
  File "/home/javad/.local/lib/python3.8/site-packages/tensorflow/python/keras/saving/hdf5_format.py", line 640, in save_weights_to_hdf5_group
    param_dset = g.create_dataset(name, val.shape, dtype=val.dtype)
  File "/home/javad/.local/lib/python3.8/site-packages/h5py/_hl/group.py", line 143, in create_dataset
    if '/' in name:
TypeError: a bytes-like object is required, not 'str'
[train]:   0%|                                                     | 1/200000 [00:26<1466:03:49, 26.39s/it]
dathudeptrai commented 3 years ago

@lalimili6 can you share ur command line ?

lalimili6 commented 3 years ago

@dathudeptrai TensorFlow version = 2.3.1 OS: ubuntu 20.04.1 gpu:

Tue Nov 17 13:19:59 2020       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 440.33.01    Driver Version: 440.33.01    CUDA Version: 10.2     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  GeForce GT 1030     Off  | 00000000:01:00.0 Off |                  N/A |
| 29%   36C    P8    N/A /  30W |    164MiB /  2001MiB |     25%      Default |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|=============================================================================|
|    0      1192      G   /usr/lib/xorg/Xorg                            21MiB |
|    0      1890      G   /usr/lib/xorg/Xorg                            60MiB |
|    0      2105      G   /usr/bin/gnome-shell                          73MiB |
+-----------------------------------------------------------------------------+

my coomand is: javad@javad-HP-ProDesk-600-G2-SFF:~/Documents/TensorFlowTTS$ CUDA_VISIBLE_DEVICES=0 python examples/tacotron2/train_tacotron2.py   --train-dir ./dump_ljspeech/train/   --dev-dir ./dump_ljspeech/valid/   --outdir ./examples/tacotron2/exp/train.tacotron2.v1   --config ./examples/tacotron2/conf/tacotron2.v1.yaml   --use-norm 1   --mixed_precision 0   --resume ""

output:


javad@javad-HP-ProDesk-600-G2-SFF:~/Documents/TensorFlowTTS$ CUDA_VISIBLE_DEVICES=0 python examples/tacotron2/train_tacotron2.py   --train-dir ./dump_ljspeech/train/   --dev-dir ./dump_ljspeech/valid/   --outdir ./examples/tacotron2/exp/train.tacotron2.v1   --config ./examples/tacotron2/conf/tacotron2.v1.yaml   --use-norm 1   --mixed_precision 0   --resume ""
2020-11-17 16:07:46.666191: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudart.so.10.1
2020-11-17 16:07:47.614335: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcuda.so.1
2020-11-17 16:07:47.634757: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:982] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-11-17 16:07:47.635090: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1716] Found device 0 with properties: 
pciBusID: 0000:01:00.0 name: GeForce GT 1030 computeCapability: 6.1
coreClock: 1.468GHz coreCount: 3 deviceMemorySize: 1.95GiB deviceMemoryBandwidth: 44.76GiB/s
2020-11-17 16:07:47.635129: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudart.so.10.1
2020-11-17 16:07:47.637108: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcublas.so.10
2020-11-17 16:07:47.638634: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcufft.so.10
2020-11-17 16:07:47.638882: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcurand.so.10
2020-11-17 16:07:47.640629: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcusolver.so.10
2020-11-17 16:07:47.641587: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcusparse.so.10
2020-11-17 16:07:47.645347: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudnn.so.7
2020-11-17 16:07:47.645474: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:982] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-11-17 16:07:47.645758: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:982] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-11-17 16:07:47.645956: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1858] Adding visible gpu devices: 0
2020-11-17 16:07:48.831453: I tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN)to use the following CPU instructions in performance-critical operations:  AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2020-11-17 16:07:49.304366: I tensorflow/core/platform/profile_utils/cpu_utils.cc:104] CPU Frequency: 3399905000 Hz
2020-11-17 16:07:49.304873: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x6064220 initialized for platform Host (this does not guarantee that XLA will be used). Devices:
2020-11-17 16:07:49.304908: I tensorflow/compiler/xla/service/service.cc:176]   StreamExecutor device (0): Host, Default Version
2020-11-17 16:07:49.919417: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:982] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-11-17 16:07:49.919929: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x5f53670 initialized for platform CUDA (this does not guarantee that XLA will be used). Devices:
2020-11-17 16:07:49.919963: I tensorflow/compiler/xla/service/service.cc:176]   StreamExecutor device (0): GeForce GT 1030, Compute Capability 6.1
2020-11-17 16:07:49.920207: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:982] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-11-17 16:07:49.920540: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1716] Found device 0 with properties: 
pciBusID: 0000:01:00.0 name: GeForce GT 1030 computeCapability: 6.1
coreClock: 1.468GHz coreCount: 3 deviceMemorySize: 1.95GiB deviceMemoryBandwidth: 44.76GiB/s
2020-11-17 16:07:49.920581: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudart.so.10.1
2020-11-17 16:07:49.920615: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcublas.so.10
2020-11-17 16:07:49.920641: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcufft.so.10
2020-11-17 16:07:49.920666: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcurand.so.10
2020-11-17 16:07:49.920692: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcusolver.so.10
2020-11-17 16:07:49.920728: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcusparse.so.10
2020-11-17 16:07:49.920754: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudnn.so.7
2020-11-17 16:07:49.920835: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:982] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-11-17 16:07:49.921189: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:982] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-11-17 16:07:49.921479: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1858] Adding visible gpu devices: 0
2020-11-17 16:07:49.941003: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudart.so.10.1
2020-11-17 16:08:08.227068: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1257] Device interconnect StreamExecutor with strength 1 edge matrix:
2020-11-17 16:08:08.227117: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1263]      0 
2020-11-17 16:08:08.227138: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1276] 0:   N 
2020-11-17 16:08:08.310725: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:982] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-11-17 16:08:08.311191: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:982] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-11-17 16:08:08.311546: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1402] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 1435 MB memory) -> physical GPU (device: 0, name: GeForce GT 1030, pci bus id: 0000:01:00.0, compute capability: 6.1)
2020-11-17 16:15:32,661 (train_tacotron2:400) INFO: hop_size = 256
2020-11-17 16:15:32,662 (train_tacotron2:400) INFO: format = npy
2020-11-17 16:15:32,662 (train_tacotron2:400) INFO: model_type = tacotron2
2020-11-17 16:15:32,662 (train_tacotron2:400) INFO: tacotron2_params = {'dataset': 'ljspeech', 'embedding_hidden_size': 512, 'initializer_range': 0.02, 'embedding_dropout_prob': 0.1, 'n_speakers': 1, 'n_conv_encoder': 5, 'encoder_conv_filters': 512, 'encoder_conv_kernel_sizes': 5, 'encoder_conv_activation': 'relu', 'encoder_conv_dropout_rate': 0.5, 'encoder_lstm_units': 256, 'n_prenet_layers': 2, 'prenet_units': 256, 'prenet_activation': 'relu', 'prenet_dropout_rate': 0.5, 'n_lstm_decoder': 1, 'reduction_factor': 1, 'decoder_lstm_units': 1024, 'attention_dim': 128, 'attention_filters': 32, 'attention_kernel': 31, 'n_mels': 80, 'n_conv_postnet': 5, 'postnet_conv_filters': 512, 'postnet_conv_kernel_sizes': 5, 'postnet_dropout_rate': 0.1, 'attention_type': 'lsa'}
2020-11-17 16:15:32,662 (train_tacotron2:400) INFO: batch_size = 2
2020-11-17 16:15:32,662 (train_tacotron2:400) INFO: remove_short_samples = True
2020-11-17 16:15:32,662 (train_tacotron2:400) INFO: allow_cache = True
2020-11-17 16:15:32,662 (train_tacotron2:400) INFO: mel_length_threshold = 32
2020-11-17 16:15:32,662 (train_tacotron2:400) INFO: is_shuffle = True
2020-11-17 16:15:32,662 (train_tacotron2:400) INFO: use_fixed_shapes = False
2020-11-17 16:15:32,662 (train_tacotron2:400) INFO: optimizer_params = {'initial_learning_rate': 0.001, 'end_learning_rate': 1e-05, 'decay_steps': 150000, 'warmup_proportion': 0.02, 'weight_decay': 0.001}
2020-11-17 16:15:32,662 (train_tacotron2:400) INFO: var_train_expr = None
2020-11-17 16:15:32,662 (train_tacotron2:400) INFO: train_max_steps = 200000
2020-11-17 16:15:32,662 (train_tacotron2:400) INFO: save_interval_steps = 1
2020-11-17 16:15:32,662 (train_tacotron2:400) INFO: eval_interval_steps = 600
2020-11-17 16:15:32,662 (train_tacotron2:400) INFO: log_interval_steps = 200
2020-11-17 16:15:32,662 (train_tacotron2:400) INFO: start_schedule_teacher_forcing = 200001
2020-11-17 16:15:32,662 (train_tacotron2:400) INFO: start_ratio_value = 0.5
2020-11-17 16:15:32,662 (train_tacotron2:400) INFO: schedule_decay_steps = 50000
2020-11-17 16:15:32,662 (train_tacotron2:400) INFO: end_ratio_value = 0.0
2020-11-17 16:15:32,662 (train_tacotron2:400) INFO: num_save_intermediate_results = 1
2020-11-17 16:15:32,662 (train_tacotron2:400) INFO: train_dir = ./dump_ljspeech/train/
2020-11-17 16:15:32,663 (train_tacotron2:400) INFO: dev_dir = ./dump_ljspeech/valid/
2020-11-17 16:15:32,663 (train_tacotron2:400) INFO: use_norm = True
2020-11-17 16:15:32,663 (train_tacotron2:400) INFO: outdir = ./examples/tacotron2/exp/train.tacotron2.v1
2020-11-17 16:15:32,663 (train_tacotron2:400) INFO: config = ./examples/tacotron2/conf/tacotron2.v1.yaml
2020-11-17 16:15:32,663 (train_tacotron2:400) INFO: resume = 
2020-11-17 16:15:32,663 (train_tacotron2:400) INFO: verbose = 1
2020-11-17 16:15:32,663 (train_tacotron2:400) INFO: mixed_precision = False
2020-11-17 16:15:32,663 (train_tacotron2:400) INFO: pretrained = 
2020-11-17 16:15:32,663 (train_tacotron2:400) INFO: version = 0.0
2020-11-17 16:15:32,663 (train_tacotron2:400) INFO: max_mel_length = 1289
2020-11-17 16:15:32,663 (train_tacotron2:400) INFO: max_char_length = 168
2020-11-17 16:16:05.624531: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudnn.so.7
2020-11-17 16:16:29.137582: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcublas.so.10
Model: "tacotron2"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
encoder (TFTacotronEncoder)  multiple                  8167936   
_________________________________________________________________
decoder_cell (TFTacotronDeco multiple                  18246402  
_________________________________________________________________
post_net (TFTacotronPostnet) multiple                  5460480   
_________________________________________________________________
residual_projection (Dense)  multiple                  41040     
=================================================================
Total params: 31,915,858
Trainable params: 31,905,618
Non-trainable params: 10,240
_________________________________________________________________
[train]:   0%|                                                                  | 0/200000 [00:00<?, ?it/s]2020-11-17 16:16:43.873594: W tensorflow/core/grappler/optimizers/loop_optimizer.cc:906] Skipping loop optimization for Merge node with control input: cond/branch_executed/_8
[train]:   0%|                                                     | 1/200000 [00:27<1531:15:58, 27.56s/it]Traceback (most recent call last):
  File "examples/tacotron2/train_tacotron2.py", line 488, in <module>
    main()
  File "examples/tacotron2/train_tacotron2.py", line 476, in main
    trainer.fit(
  File "/home/javad/.local/lib/python3.8/site-packages/tensorflow_tts/trainers/base_trainer.py", line 870, in fit
    self.run()
  File "/home/javad/.local/lib/python3.8/site-packages/tensorflow_tts/trainers/base_trainer.py", line 101, in run
    self._train_epoch()
  File "/home/javad/.local/lib/python3.8/site-packages/tensorflow_tts/trainers/base_trainer.py", line 128, in _train_epoch
    self._check_save_interval()
  File "/home/javad/.local/lib/python3.8/site-packages/tensorflow_tts/trainers/base_trainer.py", line 169, in _check_save_interval
    self.save_checkpoint()
  File "/home/javad/.local/lib/python3.8/site-packages/tensorflow_tts/trainers/base_trainer.py", line 825, in save_checkpoint
    self._model.save_weights(self.saved_path + "model-{}.h5".format(self.steps))
  File "/home/javad/.local/lib/python3.8/site-packages/tensorflow/python/keras/engine/training.py", line 2085, in save_weights
    hdf5_format.save_weights_to_hdf5_group(f, self.layers)
  File "/home/javad/.local/lib/python3.8/site-packages/tensorflow/python/keras/saving/hdf5_format.py", line 640, in save_weights_to_hdf5_group
    param_dset = g.create_dataset(name, val.shape, dtype=val.dtype)
  File "/home/javad/.local/lib/python3.8/site-packages/h5py/_hl/group.py", line 143, in create_dataset
    if '/' in name:
TypeError: a bytes-like object is required, not 'str'
[train]:   0%|                                                     | 1/200000 [00:28<1586:52:17, 28.56s/it]

I set save_interval_steps: 1 since saving model as soon as possible. best regards

lalimili6 commented 3 years ago

here is the issue https://github.com/h5py/h5py/issues/1732 I useh5py==2.10.0 and fix. best regards